VIGILANT

VIGILANT
Model Diagram

Inspiration

Current security solutions are either static (fixed cameras with blind spots) or unreliable (basic robots prone to drift and "dumb" motion alerts). We were inspired to build a system that combines the agility of the Unitree Go2 with the reasoning of a Vision-Language Model (VLM). We wanted to move beyond "seeing" and move toward "understanding"—creating a silent, persistent guardian that doesn't just record video, but provides a verified, semantic audit trail of a facility's safety. What it does

VIGILANT is an autonomous security ecosystem that transforms the Unitree Go2 into a Semantic Sentry.

Real-Time Mapping: The system anchors itself to a high-resolution digital twin of the facility, providing centimeter-level localization.

Intelligent Perception: Using COCO Instance Segmentation, VIGILANT identifies specific entities (people, equipment, hazards) with pixel-perfect accuracy.

Semantic Reasoning: An onboard VLM analyzes detections in context (e.g., identifying if a person is wearing a required lanyard or if a fire exit is obstructed).

Command & Control: Every event is logged into a MongoDB backend and visualized on a live, interactive dashboard, allowing remote operators to "see through the robot’s eyes" in a synchronized 3D space.

How we built it

We developed a tiered, Edge-First architecture to ensure reliability and performance:

The Platform: A Unitree Go2 quadruped serving as the mobile sensor base.

The Edge Engine: An external Jetson Compute Pack handles the heavy lifting, running ROS 2 Humble to coordinate 3D spatial anchoring and the AI perception stack.

The AI Stack: We integrated YOLOv11 for real-time instance masking (COCO) and a specialized VLM for high-level security reporting.

The Infrastructure: A FastAPI backend manages the data flow, while MongoDB provides a persistent, searchable history of all security incidents.

The UI: A React-based Digital Twin console using Leaflet.js, ThreeJS, and WebSockets for zero-latency telemetry and interactive "Evidence Pins."

Challenges we ran into

The primary challenge was Coordinate Fusion. Synchronizing 3D LiDAR data with 2D camera frames to accurately project an AI "mask" onto a map requires precise temporal alignment. Additionally, ensuring a stable "Zero-Drift" environment without relying on traditional, often-erratic SLAM meant we had to pivot to a Map-Relative Anchoring strategy. This required a strict physical-to-digital calibration ritual to ensure the robot’s (0,0) always matched the facility’s schematic. Accomplishments that we're proud of

We successfully moved from raw hardware to a fully integrated Digital Twin. We are particularly proud of the "Silent Sentry" logic—the robot doesn't need to bark or flash lights to be effective. Instead, it demonstrates intelligence through behavioral changes, like automatically transitioning into an "Observation Stance" when it detects a potential breach and instantly populating the remote dashboard with rich, semantic evidence. What we learned

Building VIGILANT taught us that in robotics, reliability is a feature. We learned that implementation details like networking protocols and fixed-origin math are just as important as the "fancy" AI models. We also gained deep experience in Data Serialization, learning how to efficiently stream high-bandwidth instance masks and point clouds from the edge to a web-based command center without lag. What's next for VIGILANT

Our next steps involve Multi-Agent Coordination. We envision a fleet of VIGILANT sentries sharing a single MongoDB backbone to provide total facility coverage. We also plan to implement Predictive Pathing, where the VLM predicts where an intruder is likely to go based on the floor plan and positions the Go2 to intercept them silently before they reach sensitive assets.