Inspiration
Traditional surveillance systems generate massive amounts of footage but provide very little understanding. Most tools rely on motion detection or object identification, leading to constant alerts that humans eventually ignore While exploring Gemini 3’s long-context and multimodal reasoning capabilities, I was inspired to rethink surveillance as a long-running reasoning problem, not a frame-by-frame detection task. Humans naturally understand what is “normal” for a place over time — Sentinel aims to replicate that capability using AI.
What it does
Sentinel is a context-aware video reasoning agent that continuously analyzes CCTV footage over time. Instead of flagging every movement, it: Learns behavioral baselines for each camera Detects meaningful deviations, not just objects or motion Explains why a clip matters in clear, human-readable terms Categorizes footage into: 🟢 Safe to Delete 🟡 Review Recommended 🔴 Important – Retain Sentinel assists human review without making enforcement decisions or speculative judgments.
How we built it
Sentinel was built using Google AI Studio to orchestrate Gemini 3 as a long-running, stateful reasoning agent rather than a single-prompt analyzer. Google AI Studio was used to design and iterate on multi-stage agent workflows, where each video upload represents a new temporal slice of the same camera. Gemini 3 processes video inputs using a fixed analytical schema, ensuring consistent reasoning across time. Instead of relying on raw footage, Sentinel maintains compact memory summaries that capture behavioral baselines such as motion frequency, time-of-day activity, and recurring entities. These summaries are explicitly passed back into Gemini 3 for comparison, enabling deviation-based reasoning rather than frame-level detection. Outputs are structured and explainable, supporting human-in-the-loop review instead of automated enforcement. This approach goes beyond prompt-only interaction by using Gemini 3 as a persistent reasoning engine with memory and continuity.
Challenges we ran into
Avoiding alert fatigue: Designing logic that suppresses alerts for expected or repetitive activity was harder than detecting anomalies. Simulating long-term observation: Demonstrating hours or days of reasoning within a hackathon environment required careful memory abstraction. Ethical constraints: Surveillance use cases demand strict boundaries — no facial recognition, no intent inference, and no automated enforcement. Explainability: Every flagged event needed a clear, evidence-based explanation to remain trustworthy.
Accomplishments that we're proud of
Built a surveillance system that reasons over behavioral change, not isolated frames Designed a memory-driven agent that improves analysis quality over time Reduced unnecessary human review by explicitly labeling footage safe to delete Created a clear, repeatable analysis framework suitable for real-world deployment Demonstrated a strong non–prompt-wrapper use of Gemini 3 as an orchestrator
What we learned
Long-context reasoning enables entirely new classes of applications Memory and comparison are more valuable than raw perception Human-in-the-loop design improves safety, trust, and usability Gemini 3 excels when used as a reasoning engine across time, not just a chatbot
What's next for Sentinel — Context-Aware Video Reasoning Agent
ntegrating real-time camera streams using Gemini Live APIs Adding cross-camera reasoning for multi-location monitoring Improving deviation scoring with adaptive confidence thresholds Expanding Sentinel into a general-purpose Marathon Agent for long-running video analysis tasks
Built With
- gemini3
- googleaistudio
- react
Log in or sign up for Devpost to join the conversation.