WallHacks

Inspiration

We wanted to build an open-source equivalent of Anduril’s Eagle Eye — a real-time aerial situational awareness system that fuses computer vision with augmented reality.

What It Does

WallHacks uses a Flyby Robotics drone as our “eyes” and a Meta Quest 3 headset as our “lens” into the world. The drone streams its live camera feed to an ensemble of vision models running on the edge that perform:

Object detection – identifying and tracking entities of interest.
Segmentation – isolating meaningful regions in the scene.
Monocular depth estimation – inferring 3D structure from a single camera feed.
Facial recognition – identifying known individuals in the frame.

These visual cues are combined into a spatial map and transmitted to the user’s AR headset via MQTT, where they’re rendered in the user’s field of view for an immersive, context-aware experience.

How We Built It

Established a real-time streaming pipeline (RTSP) from the drone’s onboard camera.
Ran inference across multiple computer vision models optimized for low-latency edge execution (<67 ms per frame).
Fused segmentation masks with depth data to localize objects relative to the drone’s position.
Published spatial annotations to a Unity-based AR visualization layer on the Quest 3 using MQTT messaging.

Challenges

Unity C# pipeline: every code change triggered a massive shader recompile (~18 GB).
Real-time inference: achieving consistent <67 ms latency per frame required aggressive optimization.
AR positional drift: ensuring stable overlays between the drone’s coordinate frame and the user’s headset.

Accomplishments

Achieved real-time spatial awareness—you can almost see through walls.
Optimized the entire vision-to-AR pipeline for responsive, low-latency performance.

What We Learned

Implementing real-time streaming using RTSP and MQTT.
Running complex CV inference pipelines on constrained edge devices.
Bridging robotics, networking, and AR into one seamless system.

What’s Next

Integrating SLAM for spatial mapping and drift correction.
Adding voice-commanded drone control.
Expanding facial recognition and semantic scene understanding.

Built With

huggingface
mqtt
opencv
pytorch
rtsp
unity

Submitted to

Neo Hackathon 2025
- Winner 1st Place Overall

Created by

Did a lot of the CV for live inference (really exciting to work on edge compute and actually care about optimizing inference time, which is not always the case in research)

Thomas Cong
Anush Mutyala
ece
Aleksander Garbuz

Updates

Anush Mutyala started this project — Nov 09, 2025 10:54 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.