Inspiration

We wanted to build an open-source equivalent of Anduril’s Eagle Eye — a real-time aerial situational awareness system that fuses computer vision with augmented reality.

What It Does

WallHacks uses a Flyby Robotics drone as our “eyes” and a Meta Quest 3 headset as our “lens” into the world. The drone streams its live camera feed to an ensemble of vision models running on the edge that perform:

  • Object detection – identifying and tracking entities of interest.
  • Segmentation – isolating meaningful regions in the scene.
  • Monocular depth estimation – inferring 3D structure from a single camera feed.
  • Facial recognition – identifying known individuals in the frame.

These visual cues are combined into a spatial map and transmitted to the user’s AR headset via MQTT, where they’re rendered in the user’s field of view for an immersive, context-aware experience.

How We Built It

  • Established a real-time streaming pipeline (RTSP) from the drone’s onboard camera.
  • Ran inference across multiple computer vision models optimized for low-latency edge execution (<67 ms per frame).
  • Fused segmentation masks with depth data to localize objects relative to the drone’s position.
  • Published spatial annotations to a Unity-based AR visualization layer on the Quest 3 using MQTT messaging.

Challenges

  • Unity C# pipeline: every code change triggered a massive shader recompile (~18 GB).
  • Real-time inference: achieving consistent <67 ms latency per frame required aggressive optimization.
  • AR positional drift: ensuring stable overlays between the drone’s coordinate frame and the user’s headset.

Accomplishments

  • Achieved real-time spatial awareness—you can almost see through walls.
  • Optimized the entire vision-to-AR pipeline for responsive, low-latency performance.

What We Learned

  • Implementing real-time streaming using RTSP and MQTT.
  • Running complex CV inference pipelines on constrained edge devices.
  • Bridging robotics, networking, and AR into one seamless system.

What’s Next

  • Integrating SLAM for spatial mapping and drift correction.
  • Adding voice-commanded drone control.
  • Expanding facial recognition and semantic scene understanding.

Built With

Share this project:

Updates