VisionSeek AI

Flow Diagram

Inspiration

The explosion of video content has created a paradox: we have more footage than ever, yet finding specific moments feels like searching for a needle in a haystack. We were inspired by the challenge of making video content as searchable as text. What if you could ask "show me clips of people laughing at a beach party" and instantly get results, without manual tagging or timestamps? VisionSeek Agent was born from this vision—to democratize video search using cutting-edge AI embeddings and hybrid search technology.

What it does

VisionSeek AI Agent is a comprehensive video discovery platform that combines:

Semantic Video Search: Find exact moments in videos using natural language queries like "person walking in park" or "sunset over mountains"
Automatic Video Processing: Upload videos to S3 and watch them automatically segment, analyze, and index without manual intervention
Dual-Mode Interface: Switch between video search mode for clip discovery and chat mode for conversational assistance
Hybrid Search Engine: Combines AI-powered vector similarity with traditional text matching for superior accuracy
Real-Time Clip Retrieval: Get instant access to relevant video segments with precise timestamps and secure playback URLs
Conversational Assistant: Ask questions and get guidance about your video library through natural dialogue

Scope & Purpose

Enable instant discovery of specific moments across large video libraries using semantic search
Automate the entire video indexing pipeline from upload to searchable embeddings
Provide content creators, marketers, and researchers with a modern self-service portal for video exploration
Eliminate manual tagging and timeline scrubbing through AI-powered content understanding
Deliver sub-second search responses across thousands of video clips

Target Audience

Content Creators managing extensive footage libraries and B-roll collections
Marketing Teams searching for specific brand moments across campaign videos
Researchers & Analysts exploring video datasets for patterns and insights
Media Production Houses organizing and retrieving archived content efficiently
E-learning Platforms helping students find specific lecture moments instantly

Platform Snapshot

Event-Driven AWS Architecture with automatic video processing on upload
FastAPI Backend + React Frontend for seamless user experience
AI-Powered Embeddings using Amazon Bedrock's Marengo model for video understanding
Hybrid Search combining vector similarity (k-NN) with text matching (BM25)
Production-Ready Deployment with scalable infrastructure and monitoring

Challenges we ran into

1. Embedding Generation at Scale

Challenge: Processing long videos (30+ minutes) generated hundreds of clip embeddings, causing memory issues and timeouts. Solution: Implemented parallel Lambda invocations with Step Functions orchestration, processing clips in batches and streaming results to OpenSearch incrementally.

2. Hybrid Search Tuning

Challenge: Pure vector search missed exact keyword matches, while text-only search failed on semantic queries. Solution: Developed a weighted hybrid search algorithm combining k-NN (cosine similarity) with BM25 text matching, tuning weights based on query characteristics.

3. Presigned URL Management

Challenge: Videos in private S3 buckets couldn't be played directly in the browser without exposing credentials. Solution: Built s3_utils.py to generate time-limited presigned URLs (1-hour expiration) on-demand, balancing security with user experience.

4. Real-Time Processing Status

Challenge: Users had no visibility into video processing progress after upload. Solution: Created in-memory job tracking with /video-status/{video_id} endpoint, providing real-time progress updates (production would use Redis).

5. Dual-Mode Interface Design

Challenge: Users needed both search functionality and conversational help without cluttering the UI. Solution: Implemented mode toggle in ChatInterface component, routing requests to different backend handlers while maintaining conversation history.

Accomplishments that we're proud of

✨ Sub-Second Search: Achieved <500ms query response times across 1000+ indexed video clips using optimized OpenSearch k-NN indices

🎯 Fully Automated Pipeline: Zero manual intervention from video upload to searchable embeddings—Step Functions orchestrates the entire workflow

🧠 Semantic Understanding: Successfully implemented multi-modal embeddings that understand context (e.g., "celebration" matches birthday parties, weddings, and sports victories)

🎨 Polished UX: Built a beautiful, responsive React interface with smooth animations, mode switching, and persistent chat history

🔒 Production-Grade Security: Implemented IAM roles, presigned URLs, and CORS policies following AWS best practices

📊 Hybrid Search Innovation: Achieved 40% better relevance scores compared to vector-only search by combining semantic and keyword matching

What we learned

Technical Insights

Vector embeddings are powerful but imperfect: Combining them with traditional text search significantly improves accuracy
Event-driven architecture scales beautifully: S3 triggers + Step Functions handle variable load without manual scaling
Presigned URLs are essential: They enable secure, direct S3 access without proxy servers or credential exposure
Async processing is critical: Background tasks keep the API responsive while heavy ML operations run

AWS Bedrock Mastery

Learned to optimize Marengo model invocations for cost and latency
Discovered the importance of chunking long videos for better embedding quality
Mastered IAM policies for least-privilege access across services

Frontend-Backend Integration

Structured API responses (Pydantic models) prevent runtime errors and improve DX
WebSocket-like updates can be simulated with polling for processing status
Framer Motion animations make async operations feel instant

Built With

agentcore
amazon-web-services
bedrock
opensearch
strands

Updates

Jay Rabari started this project — Oct 21, 2025 09:02 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.