Inspiration
We wanted to build an open-source equivalent of Anduril’s Eagle Eye — a real-time aerial situational awareness system that fuses computer vision with augmented reality.
What It Does
WallHacks uses a Flyby Robotics drone as our “eyes” and a Meta Quest 3 headset as our “lens” into the world. The drone streams its live camera feed to an ensemble of vision models running on the edge that perform:
- Object detection – identifying and tracking entities of interest.
- Segmentation – isolating meaningful regions in the scene.
- Monocular depth estimation – inferring 3D structure from a single camera feed.
- Facial recognition – identifying known individuals in the frame.
These visual cues are combined into a spatial map and transmitted to the user’s AR headset via MQTT, where they’re rendered in the user’s field of view for an immersive, context-aware experience.
How We Built It
- Established a real-time streaming pipeline (RTSP) from the drone’s onboard camera.
- Ran inference across multiple computer vision models optimized for low-latency edge execution (<67 ms per frame).
- Fused segmentation masks with depth data to localize objects relative to the drone’s position.
- Published spatial annotations to a Unity-based AR visualization layer on the Quest 3 using MQTT messaging.
Challenges
- Unity C# pipeline: every code change triggered a massive shader recompile (~18 GB).
- Real-time inference: achieving consistent <67 ms latency per frame required aggressive optimization.
- AR positional drift: ensuring stable overlays between the drone’s coordinate frame and the user’s headset.
Accomplishments
- Achieved real-time spatial awareness—you can almost see through walls.
- Optimized the entire vision-to-AR pipeline for responsive, low-latency performance.
What We Learned
- Implementing real-time streaming using RTSP and MQTT.
- Running complex CV inference pipelines on constrained edge devices.
- Bridging robotics, networking, and AR into one seamless system.
What’s Next
- Integrating SLAM for spatial mapping and drift correction.
- Adding voice-commanded drone control.
- Expanding facial recognition and semantic scene understanding.

Log in or sign up for Devpost to join the conversation.