🧠 About the Project
🌟 Inspiration
We’ve always been fascinated by the idea of giving machines perception — not just the ability to see, but to understand. Inspired by EVE from WALL·E, we wanted to create a system that could do more than just detect motion. We envisioned a security camera that could describe what it sees, react intelligently, and notify humans only when it truly matters.
⚙️ How We Built It
Our project runs on a Jetson Orin Nano — an ARM-powered edge device capable of real-time inference.
Hardware & Control
- An Infrared camera continuously scans the room in idle mode.
- SparkFun servos (pan & tilt) give it freedom to track moving objects.
- Once a person enters the frame, EVE locks on and follows them until they are out of the room.
AI & Event Processing
- Every 15 seconds, EVE captures an image of the scene.
- Each image is sent to Google Gemini, which generates:
- a one-sentence description,
- the number of people in frame,
- and a severity level:
info,warning, orcritical.
- a one-sentence description,
- These results are stored in a Supabase table along with the corresponding image in a Supabase storage bucket.
- Notifications for events with
warningandcriticallevels are sent through discord using a webhook.
Frontend & Integrations
- The web dashboard (built with React + Tailwind + FastAPI) displays the full event log in real time, along with the live video stream from OpenCV.
- Discord Webhooks send instant alerts for warning and critical events.
- A lightweight React Native iOS app mirrors the event log so users can review incidents anywhere.
🧩 Challenges We Faced
Building an autonomous vision system in just 36 hours wasn’t exactly peaceful.
- Our biggest challenge was ensuring that the camera could track people smoothly in real time. During operation, our workflow needed to continuously extract object coordinates, calculate movement offsets, and adjust the servos to keep the subject centered, all while processing around 15 frames per second.
What seemed like a simple feature turned out to be surprisingly complex. We had to account for factors we hadn’t initially considered, such as servo torque, camera weight, latency, the Jetson’s limited computing power, and even small obstacles in the environment. Achieving stable tracking required countless iterations: we tested different detection models, rebalanced our camera mount several times, and fine-tuned servo response. After hours of experimentation (and disassembling our rig more times than we’d like to admit), we finally reached the smoothest real-time tracking performance possible within 36 hours.
Some of our other hurdles were:
- Maintaining consistent frame rates on the Jetson Nano while running detection and servo control simultaneously.
- Managing asynchronous uploads to Supabase while waiting for Gemini responses.
- Making the frontend and mobile views update in real time without breaking API rate limits.
🚀 What We’re Proud Of
- Achieved real-time tracking and reasoning on low-power hardware.
- Combined edge AI (YOLOv11n with ByteTrack) and cloud AI (Gemini) into a seamless hybrid workflow.
- Built a full-stack pipeline: from camera → detection → Gemini → Supabase → web/mobile dashboards.
- Delivered actionable alerts, turning raw footage into human-readable summaries.
💡 What We Learned
- How to balance on-device inference and cloud reasoning efficiently.
- The power of multimodal models like Gemini to give context to visual data.
- How to design a robust event pipeline connecting embedded hardware, cloud AI, and modern web apps.
- The importance of simplicity, one concise AI-generated sentence often explains more than hundreds of frames.
- How to design embedded systems that enable software to interact seamlessly with the physical world through hardware components.
🔮 What’s Next
- Generate short video summaries of events, narrated using ElevenLabs.
- Expand to multi-camera environments with centralized Supabase management.
- Add anomaly detection for unsupervised learning of unusual movement patterns.
- Integrate with Cloudflare Workers AI to serve lightweight inference at the edge.
⚡ Tech Stack
Hardware: Jetson Nano, PCA9685 Servo Controller, USB OV2710 Arducam
AI: YOLOv11n, ByteTrack, Google Gemini, OpenCV, TensorRt
Backend: Python, FastAPI, Supabase
Frontend: JavaScript, React, TailwindCSS
Mobile: React Native (iOS)
Integrations: Discord Webhooks



Log in or sign up for Devpost to join the conversation.