Clue_0

Home page

Inspiration

Every night, my dad barricades the front door before he sleeps. We have cameras at every entrance. He still doesn't sleep well unless I'm home. I've spent more time at home than I have at university this quarter because of this as well.

It's not that the cameras don't work. It's that they cry wolf. A leaf blows past and his phone buzzes. A car drives by, another alert. A moth on the lens - three notifications in a row. It's either he's always paranoid or he's learned to ignore the notification.

My dad's anxiety isn't paranoia. It's a real response to security tools that don't actually understand what they're seeing. I built Clue_zero because home security shouldn't be a stack of dumb motion sensors. It should be a watchful intelligence that knows the difference between a bug on the screen and a burglar.

What it does

Clue_zero is a long-running autonomous AI agent that watches your home through any cameras you set up 24/7, reasons about what it sees, and only alerts you when something actually matters.

Perceives — Continuous frame analysis through NVIDIA Nemotron's multimodal vision
Reasons — Classifies what it sees with confidence scores and full reasoning chains
Remembers — Persistent local memory builds a baseline of what's normal for your home
Decides — Alerts only when something actually matters. One signal, not thirty.

How I built it

A continuous five-stage agent loop:

CAPTURE — USB webcam (Logitech C270) streams to disk at 5 frames per second
PERCEIVE — NVIDIA Nemotron 3 Nano Omni analyzes each frame via the build.nvidia.com cloud endpoint
REASON — Threat classified against the home's baseline patterns, with structured JSON output including event type, threat level, confidence, and full reasoning
DECIDE — Hermes Agent autonomously chooses: log silently, watch closely, or escalate to alert
ACT — Only the alerts that actually matter reach the homeowner

The agent runs on a local Windows laptop, calling Nemotron in the cloud through NVIDIA's hosted API i.e the Cloud Track architecture. Frames stay local; only the AI inference is offloaded.

The dashboard is a cinematic command center — warm "watchful guardian" palette, NVIDIA green technical chrome on warm amber soul. Live MediaPipe Hands tracking overlay on the camera feed shows the AI tracking your fingers in real time. A radar-style watch dial sweeps continuously. Home pulse EKG line breathes calm green at rest, spikes red at threat. Reactive breathing background shifts color with the agent's current threat assessment. Every observation persists to SQLite with full reasoning attached.

The PWA install gate lets judges access the dashboard from their phones — open the URL in Safari, "Add to Home Screen," and the full dashboard launches on their device.

Challenges I ran into

Scope honesty. Solo build, 24 hours. The hardest engineering decision was narrowing: one camera, one model, one demo moment. Everything else became "future work."

Camera device contention. On Windows, the C270 can only be held by one process. When the browser grabs it for MediaPipe tracking, the Python capture loop loses access. Built graceful fallback: browser is primary owner, falls back to polled frames if MediaPipe fails.

Race conditions under aggressive pruning. The rolling-buffer prune was so aggressive that frames disappeared between the analysis loop's glob() and stat() calls. Fixed with defensive filesystem ops — materialize stat results first, drop missing files, wrap the entire loop body so a single bad frame can't kill the agent.

Nemotron hallucinations. Early version would sometimes return "No running process matching the given name was found" — a hallucinated shell error. Added a post-response validator that catches these specific hallucination signals and discards the response before logging.

Keeping the agent honest. A vague prompt produced charitable interpretations of clearly threatening actions ("hand near camera = adjusting camera"). Rewrote the prompt with explicit threat indicators, rules ("a fist near the lens is a threat, not adjusting"), and a "err on alerting" instruction.

Accomplishments that we're proud of

The cascade works. Cover the camera with a hand and within ~5 seconds the entire dashboard cascades: background flushes red, watch dial gets a red dot, home pulse spikes, new card slams in with "camera tampering" classification, status line types out the threat. The judges literally watch the room change color in response to action.
Real Nemotron reasoning is visible. The dashboard's "What I'm seeing" panel shows actual reasoning chains from Nemotron — not pre-canned strings, real multimodal reasoning over real frames.
PWA on judges' phones. Judges install Clue_zero to their home screen via Safari "Add to Home Screen" and see the dashboard live on their devices, mirroring the laptop in real time.
Cinematic UI that respects the brand. NVIDIA green for technical chrome, warm amber for emotional soul — partnership credible AND genuinely warm. ## What we learned
Agentic tool-use is what separates "impressive demo" from "actually correct." The pipeline isn't about the model — it's about the structure around it: defensive filesystem ops, hallucination filters, explicit threat rules, rolling buffers, race-aware concurrency.
For continuous-watch products, the choice between "framerate" and "reasoning depth" is a false binary. Capture at 5fps, reason at model speed (~5s per frame), display at 30fps via browser webcam. Decouple the three. ## What's next for Clue_Zero
Event-triggered evidence retention — frames around any flagged anomaly persist as evidence; rest are pruned. Pre-buffer + cooldown architecture so retroactive triggers still capture context.
Hermes-powered conversational queries — "Hey Clue_zero, what happened last night?" The agent reads its own memory and summarizes.
ntfy.sh push notifications to the homeowner's phone with rate-limiting and threat-level routing.
Multi-camera support with shared baseline learning across the home.
Family member recognition with persistent identity memory ("this is your son, not an intruder").
Edge deployment to NVIDIA Jetson Orin Nano for fully on-device inference — the privacy-by-architecture future where footage AND reasoning both stay home. ## Personal note

I built this solo across 24 hours because my dad's sleep matters. Tonight he'll barricade the front door again. But maybe in a few months, when Clue_zero is running on his actual cameras, he'll sleep a little easier.

That's the project.

Built With

build-nvidia-com
build.nvidia.com
css
fastapi
hermes-agent
html
javascript
mediapipe
nvidia-nemotron
opencv
pwa
python
service-workers
sqlite
websockets

Updates

Tarun C Nagaraju started this project — May 16, 2026 08:46 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.