Inspiration

We've all experienced the frustration of searching through hours of video content to find a single momentโ€”whether it's a specific scene in a film, a key insight from a conference recording, or a memorable clip from raw footage. Traditional video search relies on file names, tags, and manual scrubbing, making it nearly impossible to search video content the way we search text: naturally and conversationally.

30FRAMES was born from a simple question: What if you could ask your videos questions and get instant, precise answers?

We envisioned a platform where content creators, filmmakers, marketers, and researchers could interact with their video libraries through natural languageโ€”finding exact moments, extracting clips, and organizing content as effortlessly as having a conversation.

What it does

30FRAMES is an AI-powered video intelligence platform that transforms how you interact with video content. It combines semantic understanding, natural language processing, and multimodal AI to make videos searchable, queryable, and actionable.

Core Features:

  • ๐Ÿ” Natural Language Search - Ask questions like "show me all scenes with people laughing" or "find moments discussing climate change" and get precise timestamp results
  • ๐ŸŽฌ Smart Clip Extraction - Automatically extract and save specific segments with frame-accurate precision
  • ๐Ÿ’ฌ AI Chat Interface - Have conversations with your videosโ€”ask about content, context, or specific visual/audio elements
  • ๐Ÿ“ Project Organization - Manage multiple video libraries with project-based workflows
  • โœ‚๏ธ Directors Mode - Advanced content creation tools for scriptwriting, voiceover generation, and video processing
  • ๐Ÿ” Secure Authentication - Enterprise-grade security with Google OAuth and row-level database security

Use Cases:

  • Content Creators: Find and repurpose the best moments from hours of raw footage
  • Film Students: Analyze scenes, study cinematography, and create reference libraries
  • Marketers: Extract testimonials, product shots, and key moments from campaign videos
  • Researchers: Search through interviews, lectures, and documentary footage semantically
  • Educators: Create clip collections from educational videos for curriculum development

How we built it

30FRAMES is built on a modern serverless architecture combining cutting-edge AI with proven cloud infrastructure:

Tech Stack:

Frontend:

  • Next.js 15.5.4 with App Router and Turbopack for blazing-fast development
  • React 19 with TypeScript for type-safe component development
  • Tailwind CSS 4.0 for responsive, modern UI design

AI & Video Processing:

  • TwelveLabs Marengo 2.7 engine for multimodal video understanding (visual, conversation, text-in-video)
  • Custom semantic search pipeline with confidence scoring
  • Real-time task polling system for video indexing status

Backend Infrastructure:

  • Next.js API Routes with Node.js runtime for serverless functions
  • AWS S3 with presigned URLs for secure, scalable video storage
  • Supabase PostgreSQL with Row-Level Security (RLS) for data protection
  • Google OAuth 2.0 via Supabase Auth for seamless authentication

Key Technical Decisions:

  • Presigned URL Architecture: Videos are uploaded to S3 and shared with TwelveLabs via time-limited presigned URLs (48-hour expiry), ensuring security while enabling AI processing
  • Direct API Integration: We bypassed the TwelveLabs SDK in favor of direct HTTP requests to resolve Next.js 15 fetch compatibility issues
  • Task Polling Pattern: Implemented 5-second polling intervals with 5-minute timeouts to track video indexing progress
  • FormData for Multipart Uploads: Used FormData instead of JSON for TwelveLabs API compliance

Challenges we ran into

  1. Next.js 15 SDK Compatibility Crisis

Our biggest challenge was incompatibility between the TwelveLabs SDK and Next.js 15's new fetch implementation. We encountered "Response body object should not be disturbed or locked" errors.

Solution: We engineered a workaround by bypassing the SDK entirely and implementing direct HTTP requests to the TwelveLabs v1.3 API using native fetch(), ensuring full Next.js 15 compatibility.

  1. S3 Access Control Complexity

Initially, we attempted to use public ACLs for S3 objects, but encountered "AccessControlListNotSupported" errors due to modern S3 bucket policies.

Solution: Redesigned our architecture to use presigned URLs with configurable expiration times (48 hours for processing, 7 days for permanent storage), enhancing both security and flexibility.

  1. Asynchronous Video Processing

Video indexing can take 2-5 minutes, requiring a robust polling mechanism without blocking the UI or timing out.

Solution: Implemented a sophisticated polling system with 5-second intervals, graceful timeout handling, and status feedback to users, ensuring reliable indexing tracking.

  1. Multimodal Search Optimization

Balancing search accuracy across visual content, spoken dialogue, and on-screen text required careful tuning of TwelveLabs search options.

Solution: Combined visual, conversation, and text_in_video search modes with an OR operator, achieving comprehensive semantic search across all modalities.

  1. State Management Across Components

Managing project state, video metadata, chat history, and clip libraries across a complex dashboard proved challenging.

Solution: Designed a clean component hierarchy with props drilling and local state management, avoiding unnecessary global state complexity while maintaining reactivity.

Accomplishments that we're proud of

๐ŸŽฏ Solved Next.js 15 Compatibility - Successfully architected a solution to integrate TwelveLabs AI with the latest Next.js framework when the official SDK failed

๐Ÿ”’ Enterprise-Grade Security - Implemented a complete security stack with presigned URLs, JWT sessions, RLS policies, and OAuth 2.0โ€”all in a hackathon timeline

๐Ÿš€ Sub-Second Search Speeds - Achieved near-instantaneous semantic search results across hours of video content using TwelveLabs' Marengo engine

๐ŸŽจ Intuitive UX - Built a clean, professional interface that makes complex AI operations feel simple and natural

๐Ÿ“š Comprehensive Documentation - Created 27KB of technical architecture documentation, including system diagrams, API specs, and troubleshooting guides

๐Ÿงฉ Modular Architecture - Designed a scalable, maintainable codebase with clear separation of concerns and reusable patterns

โšก Performance Optimization - Implemented video streaming, lazy loading, presigned URL caching, and efficient database indexing for production-ready performance

What we learned

Technical Insights:

  • Modern fetch() API nuances in Next.js 15 and how they differ from traditional SDKs
  • Presigned URL architecture patterns for secure cloud storage access control
  • Multimodal AI integration strategies for combining visual, audio, and text understanding
  • Asynchronous task management patterns for long-running background processes
  • TypeScript advanced patterns for type-safe API integrations and form handling

AI & Video Processing:

  • How semantic video search works under the hood (embedding models, vector similarity)
  • The importance of confidence scoring in AI-powered search results
  • Optimizing search queries for multimodal video understanding
  • Balancing indexing time vs. search accuracy trade-offs

Product & Design:

  • The critical importance of user feedback during long-running operations (polling status)
  • How natural language interfaces can make complex features accessible
  • The value of comprehensive error handling and graceful degradation
  • Why documentation is essential for team collaboration and future maintenance

Collaboration & Process:

  • The power of iterative problem-solving when facing blocking technical issues
  • How to debug production issues with proper logging and error tracking
  • The importance of git workflows and conflict resolution in fast-paced development

What's next for 30FRAMES

Immediate Roadmap (Next 3 Months):

๐ŸŽฌ Enhanced Directors Mode

  • AI-powered automatic clip compilation based on themes
  • Scene detection and shot analysis
  • Automated B-roll suggestion and matching

๐Ÿ”Š Advanced Audio Features

  • Speaker diarization (identify who's speaking when)
  • Music and sound effect detection
  • Transcript editing with auto-sync to video

๐Ÿ“Š Analytics & Insights

  • Video engagement heatmaps
  • Sentiment analysis across video content
  • Keyword trending and topic extraction

Medium-Term Vision (6-12 Months):

๐Ÿค– AI-Powered Editing Suite

  • Automated video montage generation
  • Style transfer for consistent brand aesthetics
  • AI-generated transitions and effects

๐ŸŒ Collaboration Features

  • Team workspaces with shared projects
  • Comment threads on specific video timestamps
  • Version control for edited clips

๐Ÿ“ฑ Mobile Experience

  • Native iOS/Android apps
  • On-device clip previewing
  • Offline search through cached indices

Long-Term Vision (12+ Months):

๐ŸŽฏ Enterprise Features

  • Custom model training for domain-specific content
  • API access for platform integration
  • White-label solutions for media companies

๐ŸŒ Multilingual Support

  • Automatic translation of video dialogue
  • Subtitle generation in 50+ languages
  • Cross-language semantic search

๐Ÿ”ฎ Next-Gen AI

  • Real-time video analysis during upload
  • Predictive content recommendations
  • Automated video summarization and highlight reels

๐Ÿข Industry Partnerships

  • Integration with Adobe Premiere, Final Cut Pro, DaVinci Resolve
  • Stock footage platforms (Shutterstock, Getty Images)
  • Social media management tools (Hootsuite, Buffer)

30FRAMES isn't just a toolโ€”it's a paradigm shift in how we interact with video content. We're building the future where every video is searchable, every moment is discoverable, and every creator has AI-powered superpowers. ๐Ÿš€

Built With

Share this project:

Updates