Gazer

App's main page that allows physical and voice commands to activate and land the drone.
Resources page with links on click.

Inspiration

Our initial spark came from the world of content creation and removing the need for a cameraman for vloggers and streamers. However, when looking at the Cipher Tech Digital Forensics challenge, we realized this technology had a much more critical application: Chain of Custody Assurance.

In forensic investigations, securing digital evidence (like servers or laptops) often requires a two-person team: one to handle the evidence and one to record the process to ensure legal admissibility. We wanted to build a tool that empowers a solo investigator to enter a scene and secure evidence while an autonomous "eye" documents their every move, hands-free.

What it does

Gazer is an autonomous drone platform that acts as a self-directing visual witness.

Autonomous Tracking: It uses computer vision to identify the user and keep them centered in the frame without any manual piloting.
Forensic Documentation: It records stable, third-person footage of the user's actions, creating an objective record of evidence seizure.
Smart Stabilization: Unlike standard "follow-me" modes that jitter constantly, Gazer utilizes a dynamic "Dead Zone" to ignore micro-movements, ensuring the footage is smooth and professional.
Web Control Interface: The entire system is launched and monitored via a clean Next.js dashboard.

How we built it

The core of Gazer is a high-concurrency Python backend interfacing with a Next.js frontend.

Vision Stack: We utilized YOLO (You Only Look Once) for object detection. We initially tested MediaPipe, but found YOLO offered superior tracking reliability when the subject was further away from the drone.
Drone Control: We utilized the DJI Tello SDK to send UDP commands to the hardware.
Flight Logic (PID): To ensure smooth movement, we implemented a PID (Proportional-Integral-Derivative) controller. This calculates the error between the user's face coordinates (x,y) and the center of the frame $$(x_c,y_c)$$ to adjust the drone's yaw and throttle smoothly. The control logic roughly follows: $$ u(t) = K_pe(t) + K_i\int_0^t e(\tau) d\tau + K_d\dfrac{de(t)}{dt} $$
Frontend: A Next.js web application serves as the command center, displaying the live video feed and providing "Launch" and "Land" controls.

Challenges we ran into

This project was a masterclass in concurrency and system architecture.

Thread Safety & Race Conditions: To keep the video feed real-time while simultaneously processing computer vision logic and sending network commands to the drone, we had to spin up 9 separate threads. This led to severe race conditions where the drone would receive conflicting "hover" and "move" commands simultaneously. We had to implement strict locking mechanisms and state management to resolve this.
Circular Imports: As our codebase grew, splitting the vision logic, drone control, and web server into different modules resulted in circular import errors that broke the Python interpreter. We had to restructure our entire dependency tree to fix this.
The "Jitter" Problem: Initially, the drone would react to every single pixel of movement, making the footage nauseating. We solved this by programming a "Dead Zone," a tolerance radius in the center of the frame. $$ If |Face_{pos} - Center_{frame}| < DeadZone, then Speed = 0 $$ ## Accomplishments that we're proud of
The "Dead Zone" Logic: Successfully tuning the visual threshold so the drone feels "cinematic" rather than robotic. It ignores small movements and only adjusts when the subject actually moves away.
System Stability: managing 9 concurrent threads without crashing the application is a huge win for us.
The Pivot: We are proud that we took a simple "fun" idea and pivoted it into a legitimate Forensics Tech tool that fits the prompt perfectly.

What we learned

Control Theory: We gained a deep appreciation for PID controllers. We found that tuning the $$K_p, K_i, and K_d$$ values is like an art form. A slightly wrong value means the drone oscillates out of control.
Architecture Matters: We learned the hard way that you cannot just "throw more threads" at a problem without a solid plan for shared state and resource locking.
Hardware Limitations: Working with real hardware (batteries, flight time, physical drift) is infinitely harder than software simulation.

What's next for Gazer

Object Search: Adapting the YOLO model to detect specific objects (like "Laptop" or "Hard Drive") to autonomously scan a room for evidence.
Gesture Controls: Adding hand signals and phrases to command the drone (e.g., holding up a palm to "Pause" tracking).
Swarm Capabilities: Coordinating multiple Gazers to document a crime scene from multiple angles simultaneously.

Built With

11elevenlabs
flask
gemini
next.js
opencv
python
tellosdk
ultralytics

Updates

deleted deleted started this project — Jan 23, 2026 08:48 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.