Inspiration
A recent incident required reviewing days of CCTV footage to locate a five-second event. This inefficient manual process highlighted a critical need: the ability to ask a video a question and instantly locate the exact moment an action occurred.
What it does
Sentinel transforms hours of security camera footage into an intelligent, conversational database. Security personnel use natural language queries (e.g., "Show me when the red sedan entered the parking lot") instead of manually scrubbing recordings.
The system combines Elastic's hybrid search with Google Cloud's generative AI. It understands both semantic meaning and specific metadata. Users interact via a chat interface that instantly returns precise, timestamped video segments.
How we built it
Sentinel leverages the Elastic and Google Cloud AI ecosystem:
Search and Indexing
- Elastic Search AI Platform: Built a hybrid search system combining dense vector similarity search (for semantic understanding) with keyword filtering (for precise metadata). The index stores 8-second video chunks with 1408-dimension multimodal embeddings and rich metadata.
AI Services
- Vertex AI: Generates multimodal embeddings for video chunks, enabling semantic search.
- Gemini 2.0 Flash: Powers the conversational interface, synthesizing search results into natural language.
- Video Intelligence API: Extracts comprehensive metadata, including labels and object tracking.
Architecture
The ingestion pipeline processes videos into overlapping 8-second segments for indexing. The FastAPI backend handles user queries, performs hybrid searches, and passes results to Gemini. The frontend provides the chat interface with integrated video playback.
Challenges we ran into
- Video Chunking Strategy: To balance granularity and computational efficiency, we chose 8-second chunks with 4-second overlaps.
- Context Window Management: We implemented smart summarization of metadata when relaying search results to Gemini. This ensured accurate responses while staying within token limits.
Accomplishments that we're proud of
- True Hybrid Search: Successfully implemented Elastic's hybrid search, achieving both semantic understanding and precise metadata matching.
- Video RAG: Built a complete Retrieval Augmented Generation (RAG) system for the video medium.
- Real-world Impact: Transformed hours of manual security review into seconds of conversational search.
- Seamless Integration: Demonstrated how Elastic's speed and flexibility complement Google Cloud's AI models.
What we learned
- Hybrid Search Power: Combining dense vector search with keyword filtering creates a more robust search experience.
- Multimodal AI Maturity: Working with Google Cloud APIs validated the maturity of multimodal AI, effectively capturing visual and temporal information.
- RAG System Design: Building the RAG system clarified the nuanced interaction between search quality and generation quality.
What's next for Sentinel
- Advanced Agentic Capabilities: Implement intelligence to reformulate queries and answer complex, multi-step questions (e.g., "Who started the fire and who put it out?").
- Expanded Use Cases: Adapt Sentinel beyond security for retail analytics, manufacturing quality control, and smart city management.

Log in or sign up for Devpost to join the conversation.