Watch our demo here

https://www.loom.com/share/2090f89a248446de8664ff06b365a806?sid=fb2a3999-83d5-4e7c-8d47-74d12bec8877

Inspiration

The Problem: Many YouTube videos have unclear audio, accents that are hard to understand, or voices that don't match viewer preferences. Language barriers and audio quality issues prevent people from fully enjoying educational content, entertainment, or tutorials.

The Vision: What if you could watch any YouTube video with your own voice, your favorite character's voice, or crystal-clear pronunciation? We wanted to democratize content consumption by letting users personalize how they experience YouTube videos.

What it does

TwelveLab transforms any YouTube video into a personalized audio experience using AI voice cloning. Users can:

  • Clone their own voice and hear YouTube videos in their own voice
  • Use favorite character voices from ElevenLabs library (Disney characters, celebrities, etc.)
  • Improve pronunciation with crystal-clear AI voices for better comprehension
  • Rate and discover popular voice combinations through our community system
  • Store and manage their voice preferences and generated audio files

The Chrome extension seamlessly integrates with YouTube, allowing users to transform videos with just one click.

How we built it

Frontend (Chrome Extension):

  • JavaScript-based Chrome extension with popup UI
  • YouTube integration for video ID extraction and transcript processing
  • Audio playback controls and voice selection interface

Backend (FastAPI + Supabase):

  • FastAPI for RESTful API endpoints
  • Supabase for database (PostgreSQL) and file storage
  • ElevenLabs API for voice cloning and text-to-speech generation
  • yt-dlp for YouTube transcript extraction with word-level timestamps

Database Schema:

  • Simplified 4-column design: youtube_id, voice_id, generated_audio, likes
  • Composite primary key for efficient lookups
  • Real-time like/unlike functionality

Key Features:

  • Transcript extraction with precise timing
  • Audio generation with character-level alignment
  • File storage in Supabase with presigned URLs
  • Community rating system for voice discovery

Challenges we ran into

We ran out of ElevenLabs API requests fast so we had to switch keys very often.

Accomplishments that we're proud of

✅ Complete End-to-End System: From YouTube URL to personalized audio playback ✅ Clean Architecture: Simplified database design with composite primary keys ✅ Real-time Features: Instant like/unlike functionality with live updates ✅ Scalable Backend: RESTful API that can handle multiple concurrent users ✅ Docker Containerization: Able to deploy anywhere (e.g., AWS)

🎯 Unique Value Proposition: First tool to combine YouTube + AI voice cloning + community features 🚀 Seamless Integration: One-click voice transformation directly from YouTube 🏆 Community Features: Rating system that helps users discover the best voice combinations 📱 Intuitive Interface: Clean, simple Chrome extension that doesn't overwhelm users ⚡ Fast Performance: Optimized audio generation and delivery pipeline 🎵 High Audio Quality: Crystal-clear voice synthesis with proper timing

What we learned

Technical Insights: Composite Primary Keys: How to design efficient databases without UUID dependencies Audio Processing: The complexity of synchronizing AI-generated speech with video content Chrome Extension Development: Best practices for YouTube integration and content script injection

Product Development: User Needs: People really want to personalize their content consumption experience Community Value: Rating systems significantly improve voice discovery and user engagement Simplicity Wins: 4-column database design is much more maintainable than complex schemas

API Integration: ElevenLabs Capabilities: The power and limitations of current voice cloning technology Supabase Ecosystem: How to effectively combine database and storage in a single platform Rate Limiting: Importance of proper API call management and user feedback

What's next for TwelveLab

We wanted this to be a chrome extension for 11Labs. We want to provide a seamless user experience. Soon we will be able to support multi-language transcription. This way, users can learn concepts on YouTube in languages and accents that are familiar to them. This will help others use YouTube as the learning tool it can be. We are excited to improve the user experience and get some feedback from real users.

Built With

Share this project:

Updates