30Frames

Inspiration

We've all experienced the frustration of searching through hours of video content to find a single moment—whether it's a specific scene in a film, a key insight from a conference recording, or a memorable clip from raw footage. Traditional video search relies on file names, tags, and manual scrubbing, making it nearly impossible to search video content the way we search text: naturally and conversationally.

30FRAMES was born from a simple question: What if you could ask your videos questions and get instant, precise answers?

We envisioned a platform where content creators, filmmakers, marketers, and researchers could interact with their video libraries through natural language—finding exact moments, extracting clips, and organizing content as effortlessly as having a conversation.

What it does

30FRAMES is an AI-powered video intelligence platform that transforms how you interact with video content. It combines semantic understanding, natural language processing, and multimodal AI to make videos searchable, queryable, and actionable.

Core Features:

🔍 Natural Language Search - Ask questions like "show me all scenes with people laughing" or "find moments discussing climate change" and get precise timestamp results
🎬 Smart Clip Extraction - Automatically extract and save specific segments with frame-accurate precision
💬 AI Chat Interface - Have conversations with your videos—ask about content, context, or specific visual/audio elements
📁 Project Organization - Manage multiple video libraries with project-based workflows
✂️ Directors Mode - Advanced content creation tools for scriptwriting, voiceover generation, and video processing
🔐 Secure Authentication - Enterprise-grade security with Google OAuth and row-level database security

Use Cases:

Content Creators: Find and repurpose the best moments from hours of raw footage
Film Students: Analyze scenes, study cinematography, and create reference libraries
Marketers: Extract testimonials, product shots, and key moments from campaign videos
Researchers: Search through interviews, lectures, and documentary footage semantically
Educators: Create clip collections from educational videos for curriculum development

How we built it

30FRAMES is built on a modern serverless architecture combining cutting-edge AI with proven cloud infrastructure:

Tech Stack:

Frontend:

Next.js 15.5.4 with App Router and Turbopack for blazing-fast development
React 19 with TypeScript for type-safe component development
Tailwind CSS 4.0 for responsive, modern UI design

AI & Video Processing:

TwelveLabs Marengo 2.7 engine for multimodal video understanding (visual, conversation, text-in-video)
Custom semantic search pipeline with confidence scoring
Real-time task polling system for video indexing status

Backend Infrastructure:

Next.js API Routes with Node.js runtime for serverless functions
AWS S3 with presigned URLs for secure, scalable video storage
Supabase PostgreSQL with Row-Level Security (RLS) for data protection
Google OAuth 2.0 via Supabase Auth for seamless authentication

Key Technical Decisions:

Presigned URL Architecture: Videos are uploaded to S3 and shared with TwelveLabs via time-limited presigned URLs (48-hour expiry), ensuring security while enabling AI processing
Direct API Integration: We bypassed the TwelveLabs SDK in favor of direct HTTP requests to resolve Next.js 15 fetch compatibility issues
Task Polling Pattern: Implemented 5-second polling intervals with 5-minute timeouts to track video indexing progress
FormData for Multipart Uploads: Used FormData instead of JSON for TwelveLabs API compliance

Challenges we ran into

Next.js 15 SDK Compatibility Crisis

Our biggest challenge was incompatibility between the TwelveLabs SDK and Next.js 15's new fetch implementation. We encountered "Response body object should not be disturbed or locked" errors.

Solution: We engineered a workaround by bypassing the SDK entirely and implementing direct HTTP requests to the TwelveLabs v1.3 API using native fetch(), ensuring full Next.js 15 compatibility.

S3 Access Control Complexity

Initially, we attempted to use public ACLs for S3 objects, but encountered "AccessControlListNotSupported" errors due to modern S3 bucket policies.

Solution: Redesigned our architecture to use presigned URLs with configurable expiration times (48 hours for processing, 7 days for permanent storage), enhancing both security and flexibility.

Asynchronous Video Processing

Video indexing can take 2-5 minutes, requiring a robust polling mechanism without blocking the UI or timing out.

Solution: Implemented a sophisticated polling system with 5-second intervals, graceful timeout handling, and status feedback to users, ensuring reliable indexing tracking.

Multimodal Search Optimization

Balancing search accuracy across visual content, spoken dialogue, and on-screen text required careful tuning of TwelveLabs search options.

Solution: Combined visual, conversation, and text_in_video search modes with an OR operator, achieving comprehensive semantic search across all modalities.

State Management Across Components

Managing project state, video metadata, chat history, and clip libraries across a complex dashboard proved challenging.

Solution: Designed a clean component hierarchy with props drilling and local state management, avoiding unnecessary global state complexity while maintaining reactivity.

Accomplishments that we're proud of

🎯 Solved Next.js 15 Compatibility - Successfully architected a solution to integrate TwelveLabs AI with the latest Next.js framework when the official SDK failed

🔒 Enterprise-Grade Security - Implemented a complete security stack with presigned URLs, JWT sessions, RLS policies, and OAuth 2.0—all in a hackathon timeline

🚀 Sub-Second Search Speeds - Achieved near-instantaneous semantic search results across hours of video content using TwelveLabs' Marengo engine

🎨 Intuitive UX - Built a clean, professional interface that makes complex AI operations feel simple and natural

📚 Comprehensive Documentation - Created 27KB of technical architecture documentation, including system diagrams, API specs, and troubleshooting guides

🧩 Modular Architecture - Designed a scalable, maintainable codebase with clear separation of concerns and reusable patterns

⚡ Performance Optimization - Implemented video streaming, lazy loading, presigned URL caching, and efficient database indexing for production-ready performance

What we learned

Technical Insights:

Modern fetch() API nuances in Next.js 15 and how they differ from traditional SDKs
Presigned URL architecture patterns for secure cloud storage access control
Multimodal AI integration strategies for combining visual, audio, and text understanding
Asynchronous task management patterns for long-running background processes
TypeScript advanced patterns for type-safe API integrations and form handling

AI & Video Processing:

How semantic video search works under the hood (embedding models, vector similarity)
The importance of confidence scoring in AI-powered search results
Optimizing search queries for multimodal video understanding
Balancing indexing time vs. search accuracy trade-offs

Product & Design:

The critical importance of user feedback during long-running operations (polling status)
How natural language interfaces can make complex features accessible
The value of comprehensive error handling and graceful degradation
Why documentation is essential for team collaboration and future maintenance

Collaboration & Process:

The power of iterative problem-solving when facing blocking technical issues
How to debug production issues with proper logging and error tracking
The importance of git workflows and conflict resolution in fast-paced development

What's next for 30FRAMES

Immediate Roadmap (Next 3 Months):

🎬 Enhanced Directors Mode

AI-powered automatic clip compilation based on themes
Scene detection and shot analysis
Automated B-roll suggestion and matching

🔊 Advanced Audio Features

Speaker diarization (identify who's speaking when)
Music and sound effect detection
Transcript editing with auto-sync to video

📊 Analytics & Insights

Video engagement heatmaps
Sentiment analysis across video content
Keyword trending and topic extraction

Medium-Term Vision (6-12 Months):

🤖 AI-Powered Editing Suite

Automated video montage generation
Style transfer for consistent brand aesthetics
AI-generated transitions and effects

🌐 Collaboration Features

Team workspaces with shared projects
Comment threads on specific video timestamps
Version control for edited clips

📱 Mobile Experience

Native iOS/Android apps
On-device clip previewing
Offline search through cached indices

Long-Term Vision (12+ Months):

🎯 Enterprise Features

Custom model training for domain-specific content
API access for platform integration
White-label solutions for media companies

🌍 Multilingual Support

Automatic translation of video dialogue
Subtitle generation in 50+ languages
Cross-language semantic search

🔮 Next-Gen AI

Real-time video analysis during upload
Predictive content recommendations
Automated video summarization and highlight reels

🏢 Industry Partnerships

Integration with Adobe Premiere, Final Cut Pro, DaVinci Resolve
Stock footage platforms (Shutterstock, Getty Images)
Social media management tools (Hootsuite, Buffer)

30FRAMES isn't just a tool—it's a paradigm shift in how we interact with video content. We're building the future where every video is searchable, every moment is discoverable, and every creator has AI-powered superpowers. 🚀