VoiceBridge

Inspiration

Ever tried watching a foreign-language tech talk or joining an international meeting? Language barriers are frustrating. I built VoiceBridge to break those barriers using AI - making global content and conversations accessible to everyone.

What it does

VoiceBridge provides real-time translation with natural AI voice dubbing for:

  • YouTube videos: Watch any video with synchronized translated audio (ElevenLabs voices)
  • Live meetings: Speak your language, they hear theirs - bidirectional real-time translation
  • Smart Q&A: Ask questions about any video using RAG-powered AI (Gemini)

How I built it

  • Frontend: Next.js + TypeScript for responsive UI
  • Backend: Node.js + Express deployed on Google Cloud Run
  • AI Stack:
    • Vertex AI for translation
    • ElevenLabs for natural voice synthesis
    • Gemini for RAG-based Q&A
    • Google Speech-to-Text for live transcription
  • Features: Gender detection, audio caching, sentence batching for smooth playback
  • Database: SQLite for transcript storage and history

Challenges

YouTube's IP blocking: YouTube blocks all cloud provider IPs from fetching transcripts. Solution: Backend fetches transcripts directly using Innertube API when running locally, client-side for production scenarios.

Audio sync: Keeping translated audio perfectly synced with video playback. Solution: Preloading + audio queue management + gender detection upfront.

Real-time latency: Minimizing delay in live translation. Solution: Batching requests, audio preloading, and optimized pipeline (1.5-2s total latency).

What I learned

  • Building production-ready real-time systems requires smart caching strategies
  • Working around platform restrictions (YouTube's bot detection) needs creative solutions
  • AI voice quality matters - gender detection significantly improves user experience
  • Deploying to serverless (Cloud Run) requires thinking about cold starts and scaling

What's next

  • Mobile app for on-the-go translation
  • Zoom/Teams integration for business meetings
  • Multi-speaker detection in live conversations
  • Dialect-specific voice options

Built With

Share this project:

Updates