Sentinel — Context-Aware Video Reasoning Agent

main home page

Inspiration

Traditional surveillance systems generate massive amounts of footage but provide very little understanding. Most tools rely on motion detection or object identification, leading to constant alerts that humans eventually ignore While exploring Gemini 3’s long-context and multimodal reasoning capabilities, I was inspired to rethink surveillance as a long-running reasoning problem, not a frame-by-frame detection task. Humans naturally understand what is “normal” for a place over time — Sentinel aims to replicate that capability using AI.

What it does

Sentinel is a context-aware video reasoning agent that continuously analyzes CCTV footage over time. Instead of flagging every movement, it: Learns behavioral baselines for each camera Detects meaningful deviations, not just objects or motion Explains why a clip matters in clear, human-readable terms Categorizes footage into: 🟢 Safe to Delete 🟡 Review Recommended 🔴 Important – Retain Sentinel assists human review without making enforcement decisions or speculative judgments.

How we built it

Sentinel was built using Google AI Studio to orchestrate Gemini 3 as a long-running, stateful reasoning agent rather than a single-prompt analyzer. Google AI Studio was used to design and iterate on multi-stage agent workflows, where each video upload represents a new temporal slice of the same camera. Gemini 3 processes video inputs using a fixed analytical schema, ensuring consistent reasoning across time. Instead of relying on raw footage, Sentinel maintains compact memory summaries that capture behavioral baselines such as motion frequency, time-of-day activity, and recurring entities. These summaries are explicitly passed back into Gemini 3 for comparison, enabling deviation-based reasoning rather than frame-level detection. Outputs are structured and explainable, supporting human-in-the-loop review instead of automated enforcement. This approach goes beyond prompt-only interaction by using Gemini 3 as a persistent reasoning engine with memory and continuity.

Challenges we ran into

Avoiding alert fatigue: Designing logic that suppresses alerts for expected or repetitive activity was harder than detecting anomalies. Simulating long-term observation: Demonstrating hours or days of reasoning within a hackathon environment required careful memory abstraction. Ethical constraints: Surveillance use cases demand strict boundaries — no facial recognition, no intent inference, and no automated enforcement. Explainability: Every flagged event needed a clear, evidence-based explanation to remain trustworthy.

Accomplishments that we're proud of

Built a surveillance system that reasons over behavioral change, not isolated frames Designed a memory-driven agent that improves analysis quality over time Reduced unnecessary human review by explicitly labeling footage safe to delete Created a clear, repeatable analysis framework suitable for real-world deployment Demonstrated a strong non–prompt-wrapper use of Gemini 3 as an orchestrator

What we learned

Long-context reasoning enables entirely new classes of applications Memory and comparison are more valuable than raw perception Human-in-the-loop design improves safety, trust, and usability Gemini 3 excels when used as a reasoning engine across time, not just a chatbot

What's next for Sentinel — Context-Aware Video Reasoning Agent

ntegrating real-time camera streams using Gemini Live APIs Adding cross-camera reasoning for multi-location monitoring Improving deviation scoring with adaptive confidence thresholds Expanding Sentinel into a general-purpose Marathon Agent for long-running video analysis tasks

Built With

gemini3
googleaistudio
react

Updates

kavish khurana started this project — Feb 07, 2026 12:33 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.