Watch our demo here
https://www.loom.com/share/2090f89a248446de8664ff06b365a806?sid=fb2a3999-83d5-4e7c-8d47-74d12bec8877
Inspiration
The Problem: Many YouTube videos have unclear audio, accents that are hard to understand, or voices that don't match viewer preferences. Language barriers and audio quality issues prevent people from fully enjoying educational content, entertainment, or tutorials.
The Vision: What if you could watch any YouTube video with your own voice, your favorite character's voice, or crystal-clear pronunciation? We wanted to democratize content consumption by letting users personalize how they experience YouTube videos.
What it does
TwelveLab transforms any YouTube video into a personalized audio experience using AI voice cloning. Users can:
- Clone their own voice and hear YouTube videos in their own voice
- Use favorite character voices from ElevenLabs library (Disney characters, celebrities, etc.)
- Improve pronunciation with crystal-clear AI voices for better comprehension
- Rate and discover popular voice combinations through our community system
- Store and manage their voice preferences and generated audio files
The Chrome extension seamlessly integrates with YouTube, allowing users to transform videos with just one click.
How we built it
Frontend (Chrome Extension):
- JavaScript-based Chrome extension with popup UI
- YouTube integration for video ID extraction and transcript processing
- Audio playback controls and voice selection interface
Backend (FastAPI + Supabase):
- FastAPI for RESTful API endpoints
- Supabase for database (PostgreSQL) and file storage
- ElevenLabs API for voice cloning and text-to-speech generation
- yt-dlp for YouTube transcript extraction with word-level timestamps
Database Schema:
- Simplified 4-column design: youtube_id, voice_id, generated_audio, likes
- Composite primary key for efficient lookups
- Real-time like/unlike functionality
Key Features:
- Transcript extraction with precise timing
- Audio generation with character-level alignment
- File storage in Supabase with presigned URLs
- Community rating system for voice discovery
Challenges we ran into
We ran out of ElevenLabs API requests fast so we had to switch keys very often.
Accomplishments that we're proud of
✅ Complete End-to-End System: From YouTube URL to personalized audio playback ✅ Clean Architecture: Simplified database design with composite primary keys ✅ Real-time Features: Instant like/unlike functionality with live updates ✅ Scalable Backend: RESTful API that can handle multiple concurrent users ✅ Docker Containerization: Able to deploy anywhere (e.g., AWS)
🎯 Unique Value Proposition: First tool to combine YouTube + AI voice cloning + community features 🚀 Seamless Integration: One-click voice transformation directly from YouTube 🏆 Community Features: Rating system that helps users discover the best voice combinations 📱 Intuitive Interface: Clean, simple Chrome extension that doesn't overwhelm users ⚡ Fast Performance: Optimized audio generation and delivery pipeline 🎵 High Audio Quality: Crystal-clear voice synthesis with proper timing
What we learned
Technical Insights: Composite Primary Keys: How to design efficient databases without UUID dependencies Audio Processing: The complexity of synchronizing AI-generated speech with video content Chrome Extension Development: Best practices for YouTube integration and content script injection
Product Development: User Needs: People really want to personalize their content consumption experience Community Value: Rating systems significantly improve voice discovery and user engagement Simplicity Wins: 4-column database design is much more maintainable than complex schemas
API Integration: ElevenLabs Capabilities: The power and limitations of current voice cloning technology Supabase Ecosystem: How to effectively combine database and storage in a single platform Rate Limiting: Importance of proper API call management and user feedback
What's next for TwelveLab
We wanted this to be a chrome extension for 11Labs. We want to provide a seamless user experience. Soon we will be able to support multi-language transcription. This way, users can learn concepts on YouTube in languages and accents that are familiar to them. This will help others use YouTube as the learning tool it can be. We are excited to improve the user experience and get some feedback from real users.
Built With
- css
- docker
- elevenlabs
- fastapi
- html
- javascript
- python
- supabase


Log in or sign up for Devpost to join the conversation.