Cerebus

Intelligent guardian that can see, hear, and understand what's happening around you, accessible by voice.

Inspiration

Security cameras today mostly just watch. We wanted something smarter — a real-time AI that actually understands what's happening and can respond. Like a digital guard dog. We named it Cerebus — after the mythological guardian — but it’s all modern tech: vision, voice, and spatial awareness.

What It Does

Cerebus is a real-time safety assistant that uses a camera, mic, and GPS to monitor and interpret your surroundings.

It can:

Detect people, animals, vehicles using YOLOv8
Wake up when you say “Hey Cerebus” and understand voice commands
Track your location and map detections
Send live video/audio + alerts to your phone or computer
Let you search past events (e.g. “dog near gate yesterday”)

How We Built It

Hardware

ESP32-CAM + XIAO ESP32S3 for image + sensor data
RGB LEDs + ultrasonic sensors for feedback
Custom firmware in C++ for image streaming

AI Pipeline

YOLOv8 for object detection
Wake-word model ("Hey Cerebus") in ONNX
Audio transcription with Groq API

Backend

Node.js + Express + WebSockets for live data
Python subprocesses for AI models
Google Maps API for location
Convex DB for real-time storage & querying

Frontend

React app with live video, GPS map, audio playback, and detection logs
Simple UI to review detections and issue commands

Challenges

ESP32 reliability: Wi-Fi + lighting issues during streaming
YOLO tuning: Had to balance speed vs. accuracy
Node ↔ Python: Managed async subprocess communication
Schema design: Made it flexible for audio, video, GPS, detection logs
Wake-word: Avoiding false positives in noisy environments
API key management: Too many moving parts

Wins

Full working system: camera, audio, GPS — all synced
Smooth live streaming with detection overlays
Intuitive UI — no need to be techy to use it
Searchable logs of everything the system hears/sees
Modular design: easy to plug in more models or sensors

What We Learned

YOLO is powerful, but needs tuning per use case
Wake-word detection is way harder than expected
WebSockets are perfect for responsive IoT UX
Multi-modal AI (vision + audio + location) gives way better context
Modular, loosely-coupled systems make life way easier

What’s Next

Add facial recognition (known vs unknown)
Detect unusual behavior (loitering, falls, etc.)
Integrate with smart home systems
Build a mobile app
Run models directly on-device (edge AI)
Expand to mesh networks for full property coverage

Built With

c++
cors
esp32
groq
javascript
opencv
python
react
websocket

Updates

Buyan Khurelbaatar started this project — Jun 22, 2025 01:22 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.