Inspiration
Security cameras today mostly just watch. We wanted something smarter — a real-time AI that actually understands what's happening and can respond. Like a digital guard dog. We named it Cerebus — after the mythological guardian — but it’s all modern tech: vision, voice, and spatial awareness.
What It Does
Cerebus is a real-time safety assistant that uses a camera, mic, and GPS to monitor and interpret your surroundings.
It can:
- Detect people, animals, vehicles using YOLOv8
- Wake up when you say “Hey Cerebus” and understand voice commands
- Track your location and map detections
- Send live video/audio + alerts to your phone or computer
- Let you search past events (e.g. “dog near gate yesterday”)
How We Built It
Hardware
- ESP32-CAM + XIAO ESP32S3 for image + sensor data
- RGB LEDs + ultrasonic sensors for feedback
- Custom firmware in C++ for image streaming
AI Pipeline
- YOLOv8 for object detection
- Wake-word model ("Hey Cerebus") in ONNX
- Audio transcription with Groq API
Backend
- Node.js + Express + WebSockets for live data
- Python subprocesses for AI models
- Google Maps API for location
- Convex DB for real-time storage & querying
Frontend
- React app with live video, GPS map, audio playback, and detection logs
- Simple UI to review detections and issue commands
Challenges
- ESP32 reliability: Wi-Fi + lighting issues during streaming
- YOLO tuning: Had to balance speed vs. accuracy
- Node ↔ Python: Managed async subprocess communication
- Schema design: Made it flexible for audio, video, GPS, detection logs
- Wake-word: Avoiding false positives in noisy environments
- API key management: Too many moving parts
Wins
- Full working system: camera, audio, GPS — all synced
- Smooth live streaming with detection overlays
- Intuitive UI — no need to be techy to use it
- Searchable logs of everything the system hears/sees
- Modular design: easy to plug in more models or sensors
What We Learned
- YOLO is powerful, but needs tuning per use case
- Wake-word detection is way harder than expected
- WebSockets are perfect for responsive IoT UX
- Multi-modal AI (vision + audio + location) gives way better context
- Modular, loosely-coupled systems make life way easier
What’s Next
- Add facial recognition (known vs unknown)
- Detect unusual behavior (loitering, falls, etc.)
- Integrate with smart home systems
- Build a mobile app
- Run models directly on-device (edge AI)
- Expand to mesh networks for full property coverage



Log in or sign up for Devpost to join the conversation.