Inspiration
Every night, my dad barricades the front door before he sleeps. We have cameras at every entrance. He still doesn't sleep well unless I'm home. I've spent more time at home than I have at university this quarter because of this as well.
It's not that the cameras don't work. It's that they cry wolf. A leaf blows past and his phone buzzes. A car drives by, another alert. A moth on the lens - three notifications in a row. It's either he's always paranoid or he's learned to ignore the notification.
My dad's anxiety isn't paranoia. It's a real response to security tools that don't actually understand what they're seeing. I built Clue_zero because home security shouldn't be a stack of dumb motion sensors. It should be a watchful intelligence that knows the difference between a bug on the screen and a burglar.
What it does
Clue_zero is a long-running autonomous AI agent that watches your home through any cameras you set up 24/7, reasons about what it sees, and only alerts you when something actually matters.
- Perceives — Continuous frame analysis through NVIDIA Nemotron's multimodal vision
- Reasons — Classifies what it sees with confidence scores and full reasoning chains
- Remembers — Persistent local memory builds a baseline of what's normal for your home
- Decides — Alerts only when something actually matters. One signal, not thirty.
How I built it
A continuous five-stage agent loop:
- CAPTURE — USB webcam (Logitech C270) streams to disk at 5 frames per second
- PERCEIVE — NVIDIA Nemotron 3 Nano Omni analyzes each frame via the build.nvidia.com cloud endpoint
- REASON — Threat classified against the home's baseline patterns, with structured JSON output including event type, threat level, confidence, and full reasoning
- DECIDE — Hermes Agent autonomously chooses: log silently, watch closely, or escalate to alert
- ACT — Only the alerts that actually matter reach the homeowner
The agent runs on a local Windows laptop, calling Nemotron in the cloud through NVIDIA's hosted API i.e the Cloud Track architecture. Frames stay local; only the AI inference is offloaded.
The dashboard is a cinematic command center — warm "watchful guardian" palette, NVIDIA green technical chrome on warm amber soul. Live MediaPipe Hands tracking overlay on the camera feed shows the AI tracking your fingers in real time. A radar-style watch dial sweeps continuously. Home pulse EKG line breathes calm green at rest, spikes red at threat. Reactive breathing background shifts color with the agent's current threat assessment. Every observation persists to SQLite with full reasoning attached.
The PWA install gate lets judges access the dashboard from their phones — open the URL in Safari, "Add to Home Screen," and the full dashboard launches on their device.
Challenges I ran into
Scope honesty. Solo build, 24 hours. The hardest engineering decision was narrowing: one camera, one model, one demo moment. Everything else became "future work."
Camera device contention. On Windows, the C270 can only be held by one process. When the browser grabs it for MediaPipe tracking, the Python capture loop loses access. Built graceful fallback: browser is primary owner, falls back to polled frames if MediaPipe fails.
Race conditions under aggressive pruning. The rolling-buffer prune was so aggressive that frames disappeared between the analysis loop's glob() and stat() calls. Fixed with defensive filesystem ops — materialize stat results first, drop missing files, wrap the entire loop body so a single bad frame can't kill the agent.
Nemotron hallucinations. Early version would sometimes return "No running process matching the given name was found" — a hallucinated shell error. Added a post-response validator that catches these specific hallucination signals and discards the response before logging.
Keeping the agent honest. A vague prompt produced charitable interpretations of clearly threatening actions ("hand near camera = adjusting camera"). Rewrote the prompt with explicit threat indicators, rules ("a fist near the lens is a threat, not adjusting"), and a "err on alerting" instruction.
Accomplishments that we're proud of
- The cascade works. Cover the camera with a hand and within ~5 seconds the entire dashboard cascades: background flushes red, watch dial gets a red dot, home pulse spikes, new card slams in with "camera tampering" classification, status line types out the threat. The judges literally watch the room change color in response to action.
- Real Nemotron reasoning is visible. The dashboard's "What I'm seeing" panel shows actual reasoning chains from Nemotron — not pre-canned strings, real multimodal reasoning over real frames.
- PWA on judges' phones. Judges install Clue_zero to their home screen via Safari "Add to Home Screen" and see the dashboard live on their devices, mirroring the laptop in real time.
- Cinematic UI that respects the brand. NVIDIA green for technical chrome, warm amber for emotional soul — partnership credible AND genuinely warm. ## What we learned
- Agentic tool-use is what separates "impressive demo" from "actually correct." The pipeline isn't about the model — it's about the structure around it: defensive filesystem ops, hallucination filters, explicit threat rules, rolling buffers, race-aware concurrency.
- For continuous-watch products, the choice between "framerate" and "reasoning depth" is a false binary. Capture at 5fps, reason at model speed (~5s per frame), display at 30fps via browser webcam. Decouple the three. ## What's next for Clue_Zero
- Event-triggered evidence retention — frames around any flagged anomaly persist as evidence; rest are pruned. Pre-buffer + cooldown architecture so retroactive triggers still capture context.
- Hermes-powered conversational queries — "Hey Clue_zero, what happened last night?" The agent reads its own memory and summarizes.
- ntfy.sh push notifications to the homeowner's phone with rate-limiting and threat-level routing.
- Multi-camera support with shared baseline learning across the home.
- Family member recognition with persistent identity memory ("this is your son, not an intruder").
- Edge deployment to NVIDIA Jetson Orin Nano for fully on-device inference — the privacy-by-architecture future where footage AND reasoning both stay home. ## Personal note
I built this solo across 24 hours because my dad's sleep matters. Tonight he'll barricade the front door again. But maybe in a few months, when Clue_zero is running on his actual cameras, he'll sleep a little easier.
That's the project.
Built With
- build-nvidia-com
- build.nvidia.com
- css
- fastapi
- hermes-agent
- html
- javascript
- mediapipe
- nvidia-nemotron
- opencv
- pwa
- python
- service-workers
- sqlite
- websockets
Log in or sign up for Devpost to join the conversation.