Holoscan FPGA Board w/ Cameras (120 degree & 90 degree FOV)
Car setup w/ Red Panda + FPGA Board Mounted
Front view of car camera setup
Power setup in car

Turn Any Car into a Self-Driving Vehicle

Our project is for retrofitting any existing car to make it self-driving. Self-driving should not just be limited to new vehicles, Teslas, and Waymos, but cars bought before self-driving was available too. In our project, we take a 2018 Honda Accord and give it the ability to drive itself using commercially available hardware and open-source software. Using our app, you can request a ride and the car will come to where you are and drive you there.

How We Built It

The central design problem is that driving requires two very different kinds of intelligence. Understanding a scene - recognizing that a pedestrian is about to cross, or that a lane is ending - requires slow, contextual reasoning over visual input. Actually holding a lane and applying the brakes smoothly requires fast, high-frequency control. Trying to do both in a single system forces a tradeoff between intelligence and reaction speed, so we separated them into two layers. The high-level reasoning layer runs NVIDIA's Alpamayo R1, a 10.5-billion parameter vision-language-action model. It takes in camera frames from a wide-angle and a telephoto camera along with a short history of the car's own motion, and produces high-level driving plans. Because Alpamayo is a language model at its core, it also generates natural language explanations of its decisions - the same model that decides to yield to oncoming traffic can tell a passenger why it's yielding. This dual capability is what powers the transparency features in our rider-facing iOS app. The low-level control layer is derived from sunnypilot, a fork of comma.ai's openpilot. It runs a vision model that processes camera frames at 20 Hz and a control loop that actuates steering and acceleration at 100 Hz. These fast reflexes handle the moment-to-moment driving - lane holding, smooth braking, correcting for disturbances - while the reasoning layer above sets the overall plan. Both layers, along with several supporting processes, run independently on our compute platform and communicate through comma.ai's cereal IPC messaging framework. To physically control the car, we use the comma.ai Red Panda, a bidirectional CAN bus adapter connected to the Honda through a vehicle-specific wiring harness. Our software translates driving commands into CAN bus frames that the Honda's systems understand — steering torque, acceleration, braking — and sends them to the Red Panda, which transmits them onto the car's internal network. Vehicle state flows back through the same path, keeping the software in sync with what the car is actually doing. The system runs on an NVIDIA Jetson AGX Thor, with cameras connected through the Holoscan Sensor Bridge, which routes uncompressed video directly into GPU memory for minimal latency processing. The iOS app serves as both a rideshare dispatch system and a transparency interface. Riders request pickups, and the car navigates to them. During the ride, the app displays a live feed of the system's reasoning — what it sees, what it's doing, and why — drawn directly from Alpamayo's language output. In a driverless context, this kind of visibility is a practical necessity.

Challenges

One of the most difficult challenges was striking the balance between a performant and optimized model and one that actually performed well. Our hope was to run Chain of Causation models in real time on the AGX Thor with around 10hz control frequency, but these models were well over 2B parameters which made it essentially impossible to run under the 100ms per cycle compute time limit. Especially given the memory bandwidth cap of 276GB/s this made memory loading time on autoregressive chain of causation models to be a massive bottleneck. Without a custom ptx kernel that would do essentially what Flash Attention did - completely removing multiple memory read and write steps to work around memory bottlenecks. The hardest part of this project was integration. Each individual component — the reasoning model, the control software, the CAN bus interface, the camera pipeline — works on its own. Getting them all to work together reliably was where the real difficulty lay. The most fundamental challenge was bridging the two timescales of our architecture. The reasoning model takes hundreds of milliseconds to process a scene. The control loop needs to respond every ten milliseconds. Designing the handoff so that slow plan updates translate into smooth, continuous actuation — without jerks or gaps — required careful work on buffering, timing, and interpolation between the two systems. Hardware integration was equally demanding. Our stack spans four ecosystems — NVIDIA, Lattice Semiconductor, comma.ai, and Honda — each with its own data formats, protocols, and assumptions. Sunnypilot was built for Android with comma.ai's own cameras; we had to adapt it to Linux, swap in different camera sensors with different calibrations, route data through the Holoscan bridge, and make it all talk to the same CAN bus interface. Every boundary between ecosystems was its own set of problems. Finally, we had to ensure that generating natural language explanations from the reasoning model never interfered with the driving task. The language output runs as a secondary, asynchronous process — useful for riders, but never in the critical path of vehicle control.

The car, which is controlled by the Jetson Thor, uses MQTT to communicate with the iPhone app through an MQTT broker, which is a Google Compute Engine instance.

Safety

Building a system that physically controls a moving vehicle carries obvious responsibility, and safety considerations informed our architecture from the start. The Red Panda firmware validates every CAN frame before transmitting it to the car. Malformed or out-of-range commands are rejected at the hardware level before they ever reach the vehicle's systems. On the camera side, the Holoscan Sensor Bridge and Lattice FPGA board provide a deterministic data path from the IMX274 cameras over MIPI to GPU memory - there is no software bottleneck or unpredictable CPU scheduling in the way of incoming visual data, which reduces the risk of stale or dropped frames reaching the driving model. On the software side, sunnypilot inherits openpilot's safety model: the driver can always override the system by touching the steering wheel or pressing the brake, which immediately disengages autonomous control. The system monitors for driver attentiveness and will alert and disengage if the driver is unresponsive for too long. Our two-layer architecture also provides a natural safety boundary. The low-level controller operates independently of the reasoning model - if Alpamayo stalls or produces an unreasonable plan, the fast control loop continues to hold the lane and maintain safe following distance using its own visual perception. The reasoning layer can fail gracefully without the car losing basic control. For the rideshare context, the iOS app gives passengers direct access to the reasoning layer. Riders can see a live interpretation of the system's actions - why it's slowing down, why it chose a particular lane, what it's anticipating ahead. This goes beyond passive status updates: because Alpamayo is a language model, passengers can actually converse with the system, ask questions about its decisions, and provide feedback. If a rider prefers a different route or wants to understand why the car is taking a particular path, they can say so, and the model can process that input as part of its planning. The app also enhances navigation by surfacing the model's contextual awareness - not just turn-by-turn directions, but an understanding of traffic conditions, road geometry, and obstacles that inform routing decisions. The result is that riders don't just observe autonomy - they interact with it, understand its thinking, and have a channel to influence it. We treat this project as a research prototype, not a production deployment. All testing was conducted in controlled conditions with a safety driver behind the wheel at all times.

What We Learned

Hardware is hard. There's a bunch of firmware issues, driver issues, and just a bunch of other risks that pose challenges. We developed a working familiarity with the NVIDIA autonomous driving ecosystem - Alpamayo, Cosmos, Holoscan, and JetPack - and with the comma.ai open-source stack for vehicle control. The gap between a model that works in simulation and a car that physically turns its steering wheel is large, and it is almost entirely composed of integration engineering.

Next steps

Our goal is to distill Alpamayo down to a model small enough to run directly on the Jetson AGX Thor in real time. W

For on-device inference, we are using Thunder Kittens and TensorRT-Edge-LLM to write hyper-optimized CUDA kernels targeting a quantized version of the distilled model in NVFP4 (4-bit floating point). This combination should allow us to hit real-time inference on the Thor's Blackwell GPU, as well as cheaper hardware. We should note that the main bottleneck we'll be solving with custom kernels is memory bandwidth, and not compute.

Stay tuned for future updates.