Inspiration

Security cameras today mostly just watch. We wanted something smarter — a real-time AI that actually understands what's happening and can respond. Like a digital guard dog. We named it Cerebus — after the mythological guardian — but it’s all modern tech: vision, voice, and spatial awareness.


What It Does

Cerebus is a real-time safety assistant that uses a camera, mic, and GPS to monitor and interpret your surroundings.

It can:

  • Detect people, animals, vehicles using YOLOv8
  • Wake up when you say “Hey Cerebus” and understand voice commands
  • Track your location and map detections
  • Send live video/audio + alerts to your phone or computer
  • Let you search past events (e.g. “dog near gate yesterday”)

How We Built It

Hardware

  • ESP32-CAM + XIAO ESP32S3 for image + sensor data
  • RGB LEDs + ultrasonic sensors for feedback
  • Custom firmware in C++ for image streaming

AI Pipeline

  • YOLOv8 for object detection
  • Wake-word model ("Hey Cerebus") in ONNX
  • Audio transcription with Groq API

Backend

  • Node.js + Express + WebSockets for live data
  • Python subprocesses for AI models
  • Google Maps API for location
  • Convex DB for real-time storage & querying

Frontend

  • React app with live video, GPS map, audio playback, and detection logs
  • Simple UI to review detections and issue commands

Challenges

  • ESP32 reliability: Wi-Fi + lighting issues during streaming
  • YOLO tuning: Had to balance speed vs. accuracy
  • Node ↔ Python: Managed async subprocess communication
  • Schema design: Made it flexible for audio, video, GPS, detection logs
  • Wake-word: Avoiding false positives in noisy environments
  • API key management: Too many moving parts

Wins

  • Full working system: camera, audio, GPS — all synced
  • Smooth live streaming with detection overlays
  • Intuitive UI — no need to be techy to use it
  • Searchable logs of everything the system hears/sees
  • Modular design: easy to plug in more models or sensors

What We Learned

  • YOLO is powerful, but needs tuning per use case
  • Wake-word detection is way harder than expected
  • WebSockets are perfect for responsive IoT UX
  • Multi-modal AI (vision + audio + location) gives way better context
  • Modular, loosely-coupled systems make life way easier

What’s Next

  • Add facial recognition (known vs unknown)
  • Detect unusual behavior (loitering, falls, etc.)
  • Integrate with smart home systems
  • Build a mobile app
  • Run models directly on-device (edge AI)
  • Expand to mesh networks for full property coverage

Built With

Share this project:

Updates