VoiceBridge

Inspiration

Ever tried watching a foreign-language tech talk or joining an international meeting? Language barriers are frustrating. I built VoiceBridge to break those barriers using AI - making global content and conversations accessible to everyone.

What it does

VoiceBridge provides real-time translation with natural AI voice dubbing for:

YouTube videos: Watch any video with synchronized translated audio (ElevenLabs voices)
Live meetings: Speak your language, they hear theirs - bidirectional real-time translation
Smart Q&A: Ask questions about any video using RAG-powered AI (Gemini)

How I built it

Frontend: Next.js + TypeScript for responsive UI
Backend: Node.js + Express deployed on Google Cloud Run
AI Stack:
- Vertex AI for translation
- ElevenLabs for natural voice synthesis
- Gemini for RAG-based Q&A
- Google Speech-to-Text for live transcription
Features: Gender detection, audio caching, sentence batching for smooth playback
Database: SQLite for transcript storage and history

Challenges

YouTube's IP blocking: YouTube blocks all cloud provider IPs from fetching transcripts. Solution: Backend fetches transcripts directly using Innertube API when running locally, client-side for production scenarios.

Audio sync: Keeping translated audio perfectly synced with video playback. Solution: Preloading + audio queue management + gender detection upfront.

Real-time latency: Minimizing delay in live translation. Solution: Batching requests, audio preloading, and optimized pipeline (1.5-2s total latency).

What I learned

Building production-ready real-time systems requires smart caching strategies
Working around platform restrictions (YouTube's bot detection) needs creative solutions
AI voice quality matters - gender detection significantly improves user experience
Deploying to serverless (Cloud Run) requires thinking about cold starts and scaling

What's next

Mobile app for on-the-go translation
Zoom/Teams integration for business meetings
Multi-speaker detection in live conversations
Dialect-specific voice options

Built With

ai
api
build
cloud
docker
elevenlabs
express.js
gemini
git/github
google
javascript/typescript
next.js
node.js
python
react
run
speech-to-text
sqlite
vertex
websocket
youtube

Updates

Panam Dodia started this project — Dec 29, 2025 06:47 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.