Inspiration

First responders operate in incredibly high-risk and spontaneous environments where situational awareness can mean the difference between life and death. With the advent of several VLMs and embedded computers, such as the Jetson Orin Nano Super, we saw a potential to supplement first responders' situational awareness. High-risk scenarios mean visibility can be compromised, GPS signals aren't as strong, a risk of stroke, and other critical details may be missed. Many of these first responders work in high-risk environments and already wear helmets, which led to our project: what if we could extend the functionality of these helmets beyond just their physical protectiveness to actively enhance human perception?

We came up with the idea of the Spartan helmet as an intelligence layer directly embedded into already used protective equipment. Instead of adding another handheld device or remote dashboard, we designed a system that integrates sensing, computation, and augmented visualization directly into a wearable form factor. Spartan is built to support and not replace human judgment.

What it does

Spartan is a real-time edge AI helmet that fuses multimodal sensing into a live augmented heads-up display (HUD). It integrates dual vision cameras, infrared imaging for night vision, IMU motion tracking, GPS positioning, physiological monitoring (heart rate and temperature), and an on-device inference and contextual world memory.

Spartan is powered by an NVIDIA Jetson Orin Nano. It can recognize people and objects in real time, detect these contextual events as and/or hazards, and track what has been seen over time (world memory) thanks to a custom memory system we built; this system leverages real-time VLM reasoning analysis to generate a world context "story" of what the Spartan helmet is seeing as users respond to their respective situations. This contextual memory also stores and monitors responder stress and physiological state, where the helmet is able to track pulse data and temperature data (whether your body is removing or adding heat through the glabrous skin on the forehead). Finally, the Spartan helmet overlays all this critical information directly into a dual-eye HUD.

Unlike simple frame-by-frame detection systems, Spartan builds temporal awareness by understanding not just what is visible now, but what has happened recently.

How we built it

Spartan was designed as a fully integrated hardware-software system. We split the building process amongst us into 3 different subsystems.

Starting with the hardware and mechanical design, we designed and modeled the helmet enclosure and optical housing using CAD. This allowed us to precisely position the dual-eye display relative to the lenses and optimize spacing for inter-pupillary alignment. We further integrated camera modules and infrared sensors into the helmet shell. We mounted the Jetson Orin Nano Super on the helmet in a manner that allowed it to optimally reach all of its modules. Finally, it took us several attempts, but we managed to route all the sensor wiring as cleanly as possible. Using CAD early in development enabled rapid iteration on ergonomics, weight distribution, and component placement before physical assembly.

We then moved on to the embedded hardware stack, which started with the central Jetson Orin Nano Super for the edge inference and compute. The stack included two 12 MP camera modules for depth perception, an infrared camera for night vision, an IMU for motion detection and localization, a GPS to supplement the localization, a heart rate sensor, a temperature sensor to detect body heat, and, finally, a display for the dual-eye HUD display.

Lastly, for the software, we tried to make Spartan as modular as possible. The final software stack consisted of Python sensor-level ingestion code and a Python syncing module to sync/packetize all the raw sensor data. Once the data was bundled, we fed it into the real-time inference pipeline using the VILA1.5-3b model. We built a cyclical world memory layer module using the output from the VLM reasoning and the OpenAI API (gpt-5-mini) for an updated data store of what the camera inputs have been capturing over time in a session. The last module was the HUD rendering module, which split the camera feed into two mathematically warped videos with calibration controls to show the user a smooth view of what the glasses are showing them.

We purposely separated sensor ingestion, synchronization, inference, memory, and rendering. This modular design allowed us to simulate hardware inputs early and build the HUD and edge AI logic in parallel with mechanical development.

Challenges we ran into

The most unexpected challenge we ran into was actually related to the optics of the dual-eye HUD display. We had to figure out some of the math behind how the optics of similar systems like Google Cardboard VR work and attempt to replicate that in our own software/CAD designs. Proper optical isolation, inter-pupillary alignment, and viewport calibration required a LOT of iterative tuning. Sensor synchronization was also a big challenge, as different sensors operate at different frequencies. The IMU runs at high frequency, GPS updates slowly, and physiological data arrives intermittently. Designing a synchronization system that aligned these streams without blocking the render loop required careful buffer management. Finally, integrating the cameras, figuring out the device tree config, and embedding all the specific drivers for each module took the longest time, as always with hardware builds ;)

Accomplishments that we're proud of

  • A fully modular sensor ingestion and synchronization pipeline
  • A working dual-eye HUD system with live calibration controls
  • On-device inference capable of recognition and contextual tracking
  • Real-time physiological monitoring integrated into the display
  • A CAD-designed enclosure integrating sensing, compute, and optics

What we learned

One of the biggest lessons we took away is that real-time systems require strict separation between computation-heavy tasks and rendering loops. This build required several rendering loops, and the inference for the VLM reasoning was far more computationally heavy than refreshing the HUD; this meant we had to modularize the two processes into two distinct compartments to prevent a visible buffer for the user. We also learned that sensor synchronization is more important than the raw sensor accuracy, where in a system like this, the greater collective of sensor data being synced is far more important than any individual sensor's accuracy. This mattered greatly because minor discrepancies between sensor data severely impair situational awareness, which is crucial in a build like this.

Something we learned on both the hardware and software side is that optical systems are as complex as AI systems. We learned the math behind the optics for the dual display and how to combine that with live CV/VLM intelligence.

What's next for Spartan | Edge AI Situational Helmet for First Responders

The Spartan helmet we built is a very early prototype of a larger vision. The first improvement we would make for obvious comfort is refining the optical calibration and distortion correction to make the focus on the HUD far less strenuous on the eyes. Next, we would expand our basic VLM detection reasoning to more complex hazard classification, which is possible through some of the physical reasoning models like NVIDIA Cosmos. Another improvement, which is a bit harder to do, is improving indoor localization, simply by using a better IMU and more fine-tuned localization sensors to give more accurate information on where the user needs to go.

The most exciting next step is to leverage the power of multiple Spartan helmets and have them communicate to help give a larger contextual awareness to a situation (where Person #1 picks up a hazard, and Person #2's Spartan now knows where and what the hazard is before interacting with it). Overall, in the long term, Spartan could evolve into a scalable edge AI platform for firefighting, emergency medical response, disaster recovery, and tactical operations.

Built With

Share this project:

Updates