Inspiration
We've all experienced the frustration of searching through hours of video content to find a single momentโwhether it's a specific scene in a film, a key insight from a conference recording, or a memorable clip from raw footage. Traditional video search relies on file names, tags, and manual scrubbing, making it nearly impossible to search video content the way we search text: naturally and conversationally.
30FRAMES was born from a simple question: What if you could ask your videos questions and get instant, precise answers?
We envisioned a platform where content creators, filmmakers, marketers, and researchers could interact with their video libraries through natural languageโfinding exact moments, extracting clips, and organizing content as effortlessly as having a conversation.
What it does
30FRAMES is an AI-powered video intelligence platform that transforms how you interact with video content. It combines semantic understanding, natural language processing, and multimodal AI to make videos searchable, queryable, and actionable.
Core Features:
- ๐ Natural Language Search - Ask questions like "show me all scenes with people laughing" or "find moments discussing climate change" and get precise timestamp results
- ๐ฌ Smart Clip Extraction - Automatically extract and save specific segments with frame-accurate precision
- ๐ฌ AI Chat Interface - Have conversations with your videosโask about content, context, or specific visual/audio elements
- ๐ Project Organization - Manage multiple video libraries with project-based workflows
- โ๏ธ Directors Mode - Advanced content creation tools for scriptwriting, voiceover generation, and video processing
- ๐ Secure Authentication - Enterprise-grade security with Google OAuth and row-level database security
Use Cases:
- Content Creators: Find and repurpose the best moments from hours of raw footage
- Film Students: Analyze scenes, study cinematography, and create reference libraries
- Marketers: Extract testimonials, product shots, and key moments from campaign videos
- Researchers: Search through interviews, lectures, and documentary footage semantically
- Educators: Create clip collections from educational videos for curriculum development
How we built it
30FRAMES is built on a modern serverless architecture combining cutting-edge AI with proven cloud infrastructure:
Tech Stack:
Frontend:
- Next.js 15.5.4 with App Router and Turbopack for blazing-fast development
- React 19 with TypeScript for type-safe component development
- Tailwind CSS 4.0 for responsive, modern UI design
AI & Video Processing:
- TwelveLabs Marengo 2.7 engine for multimodal video understanding (visual, conversation, text-in-video)
- Custom semantic search pipeline with confidence scoring
- Real-time task polling system for video indexing status
Backend Infrastructure:
- Next.js API Routes with Node.js runtime for serverless functions
- AWS S3 with presigned URLs for secure, scalable video storage
- Supabase PostgreSQL with Row-Level Security (RLS) for data protection
- Google OAuth 2.0 via Supabase Auth for seamless authentication
Key Technical Decisions:
- Presigned URL Architecture: Videos are uploaded to S3 and shared with TwelveLabs via time-limited presigned URLs (48-hour expiry), ensuring security while enabling AI processing
- Direct API Integration: We bypassed the TwelveLabs SDK in favor of direct HTTP requests to resolve Next.js 15 fetch compatibility issues
- Task Polling Pattern: Implemented 5-second polling intervals with 5-minute timeouts to track video indexing progress
- FormData for Multipart Uploads: Used FormData instead of JSON for TwelveLabs API compliance
Challenges we ran into
- Next.js 15 SDK Compatibility Crisis
Our biggest challenge was incompatibility between the TwelveLabs SDK and Next.js 15's new fetch implementation. We encountered "Response body object should not be disturbed or locked" errors.
Solution: We engineered a workaround by bypassing the SDK entirely and implementing direct HTTP requests to the TwelveLabs v1.3 API using native fetch(), ensuring full Next.js 15 compatibility.
- S3 Access Control Complexity
Initially, we attempted to use public ACLs for S3 objects, but encountered "AccessControlListNotSupported" errors due to modern S3 bucket policies.
Solution: Redesigned our architecture to use presigned URLs with configurable expiration times (48 hours for processing, 7 days for permanent storage), enhancing both security and flexibility.
- Asynchronous Video Processing
Video indexing can take 2-5 minutes, requiring a robust polling mechanism without blocking the UI or timing out.
Solution: Implemented a sophisticated polling system with 5-second intervals, graceful timeout handling, and status feedback to users, ensuring reliable indexing tracking.
- Multimodal Search Optimization
Balancing search accuracy across visual content, spoken dialogue, and on-screen text required careful tuning of TwelveLabs search options.
Solution: Combined visual, conversation, and text_in_video search modes with an OR operator, achieving comprehensive semantic search across all modalities.
- State Management Across Components
Managing project state, video metadata, chat history, and clip libraries across a complex dashboard proved challenging.
Solution: Designed a clean component hierarchy with props drilling and local state management, avoiding unnecessary global state complexity while maintaining reactivity.
Accomplishments that we're proud of
๐ฏ Solved Next.js 15 Compatibility - Successfully architected a solution to integrate TwelveLabs AI with the latest Next.js framework when the official SDK failed
๐ Enterprise-Grade Security - Implemented a complete security stack with presigned URLs, JWT sessions, RLS policies, and OAuth 2.0โall in a hackathon timeline
๐ Sub-Second Search Speeds - Achieved near-instantaneous semantic search results across hours of video content using TwelveLabs' Marengo engine
๐จ Intuitive UX - Built a clean, professional interface that makes complex AI operations feel simple and natural
๐ Comprehensive Documentation - Created 27KB of technical architecture documentation, including system diagrams, API specs, and troubleshooting guides
๐งฉ Modular Architecture - Designed a scalable, maintainable codebase with clear separation of concerns and reusable patterns
โก Performance Optimization - Implemented video streaming, lazy loading, presigned URL caching, and efficient database indexing for production-ready performance
What we learned
Technical Insights:
- Modern fetch() API nuances in Next.js 15 and how they differ from traditional SDKs
- Presigned URL architecture patterns for secure cloud storage access control
- Multimodal AI integration strategies for combining visual, audio, and text understanding
- Asynchronous task management patterns for long-running background processes
- TypeScript advanced patterns for type-safe API integrations and form handling
AI & Video Processing:
- How semantic video search works under the hood (embedding models, vector similarity)
- The importance of confidence scoring in AI-powered search results
- Optimizing search queries for multimodal video understanding
- Balancing indexing time vs. search accuracy trade-offs
Product & Design:
- The critical importance of user feedback during long-running operations (polling status)
- How natural language interfaces can make complex features accessible
- The value of comprehensive error handling and graceful degradation
- Why documentation is essential for team collaboration and future maintenance
Collaboration & Process:
- The power of iterative problem-solving when facing blocking technical issues
- How to debug production issues with proper logging and error tracking
- The importance of git workflows and conflict resolution in fast-paced development
What's next for 30FRAMES
Immediate Roadmap (Next 3 Months):
๐ฌ Enhanced Directors Mode
- AI-powered automatic clip compilation based on themes
- Scene detection and shot analysis
- Automated B-roll suggestion and matching
๐ Advanced Audio Features
- Speaker diarization (identify who's speaking when)
- Music and sound effect detection
- Transcript editing with auto-sync to video
๐ Analytics & Insights
- Video engagement heatmaps
- Sentiment analysis across video content
- Keyword trending and topic extraction
Medium-Term Vision (6-12 Months):
๐ค AI-Powered Editing Suite
- Automated video montage generation
- Style transfer for consistent brand aesthetics
- AI-generated transitions and effects
๐ Collaboration Features
- Team workspaces with shared projects
- Comment threads on specific video timestamps
- Version control for edited clips
๐ฑ Mobile Experience
- Native iOS/Android apps
- On-device clip previewing
- Offline search through cached indices
Long-Term Vision (12+ Months):
๐ฏ Enterprise Features
- Custom model training for domain-specific content
- API access for platform integration
- White-label solutions for media companies
๐ Multilingual Support
- Automatic translation of video dialogue
- Subtitle generation in 50+ languages
- Cross-language semantic search
๐ฎ Next-Gen AI
- Real-time video analysis during upload
- Predictive content recommendations
- Automated video summarization and highlight reels
๐ข Industry Partnerships
- Integration with Adobe Premiere, Final Cut Pro, DaVinci Resolve
- Stock footage platforms (Shutterstock, Getty Images)
- Social media management tools (Hootsuite, Buffer)
30FRAMES isn't just a toolโit's a paradigm shift in how we interact with video content. We're building the future where every video is searchable, every moment is discoverable, and every creator has AI-powered superpowers. ๐
Built With
- amazon-web-services
- elevenlabs
- gemini
- nextjs
- node.js
- s3
- twelvelabs
- typescript
Log in or sign up for Devpost to join the conversation.