Inspiration

A recent incident required reviewing days of CCTV footage to locate a five-second event. This inefficient manual process highlighted a critical need: the ability to ask a video a question and instantly locate the exact moment an action occurred.

What it does

Sentinel transforms hours of security camera footage into an intelligent, conversational database. Security personnel use natural language queries (e.g., "Show me when the red sedan entered the parking lot") instead of manually scrubbing recordings.

The system combines Elastic's hybrid search with Google Cloud's generative AI. It understands both semantic meaning and specific metadata. Users interact via a chat interface that instantly returns precise, timestamped video segments.

How we built it

Sentinel leverages the Elastic and Google Cloud AI ecosystem:

Search and Indexing

  • Elastic Search AI Platform: Built a hybrid search system combining dense vector similarity search (for semantic understanding) with keyword filtering (for precise metadata). The index stores 8-second video chunks with 1408-dimension multimodal embeddings and rich metadata.

AI Services

  • Vertex AI: Generates multimodal embeddings for video chunks, enabling semantic search.
  • Gemini 2.0 Flash: Powers the conversational interface, synthesizing search results into natural language.
  • Video Intelligence API: Extracts comprehensive metadata, including labels and object tracking.

Architecture

The ingestion pipeline processes videos into overlapping 8-second segments for indexing. The FastAPI backend handles user queries, performs hybrid searches, and passes results to Gemini. The frontend provides the chat interface with integrated video playback.

Challenges we ran into

  • Video Chunking Strategy: To balance granularity and computational efficiency, we chose 8-second chunks with 4-second overlaps.
  • Context Window Management: We implemented smart summarization of metadata when relaying search results to Gemini. This ensured accurate responses while staying within token limits.

Accomplishments that we're proud of

  • True Hybrid Search: Successfully implemented Elastic's hybrid search, achieving both semantic understanding and precise metadata matching.
  • Video RAG: Built a complete Retrieval Augmented Generation (RAG) system for the video medium.
  • Real-world Impact: Transformed hours of manual security review into seconds of conversational search.
  • Seamless Integration: Demonstrated how Elastic's speed and flexibility complement Google Cloud's AI models.

What we learned

  • Hybrid Search Power: Combining dense vector search with keyword filtering creates a more robust search experience.
  • Multimodal AI Maturity: Working with Google Cloud APIs validated the maturity of multimodal AI, effectively capturing visual and temporal information.
  • RAG System Design: Building the RAG system clarified the nuanced interaction between search quality and generation quality.

What's next for Sentinel

  • Advanced Agentic Capabilities: Implement intelligence to reformulate queries and answer complex, multi-step questions (e.g., "Who started the fire and who put it out?").
  • Expanded Use Cases: Adapt Sentinel beyond security for retail analytics, manufacturing quality control, and smart city management.

Built With

Share this project:

Updates

posted an update

Update 27th October

  • Detailed Analysis mode: Introduced detailed analysis mode which ingests relevant clips to the LLM for hyper-relevant answers.

  • Significantly improved the hybrid search: Now uses RRF algorithm for better combined results.

  • Richer Metadata & video chunks: Each video now has richer metadata from the video intelligence API. We also increased the chunks from 8s to 16s and have noticed significant improvements to answer quality.

  • Improved RAG prompt: Clearer, more reliable answers.

  • Improved user experience: Significant UX updates for the demo.

Log in or sign up for Devpost to join the conversation.