Inspiration
We were inspired by the limitations of current translation tools, which often miss the cultural nuances and emotional tone embedded in speech, and lip syncing. We asked: What if your voice could speak another language, with the right words, tone, emotions, and still look like it’s coming from your lips? With AI advancements in speech synthesis, translation, and lip-syncing, we saw an opportunity to create a playful but powerful app that makes cross-cultural video communication seamless.
What it does
LoL: Lord of the Lings lets you:
Record a short video Automatically extract the spoken audio Transcribe and translate it Synthesize speech in the target language And lip-sync the original video to match the new voice
In the end, users get a new video of themselves — speaking fluently and naturally in another language, like a real-time dubbing studio.
How we built it
We used:
Gradio to build an intuitive browser interface for video recording and previewing results A pipeline of modular functions: speech_to_text model – to transcribe with language input translator llm – for contextual and culturally-aware translation text_to_speech model – for generating realistic audio in the new language lipsynch model – to sync the new voice to the original speaker's lips Everything is wrapped in a Gradio interface, making it easy and interactive for end users.
Challenges we ran into
- Accurate voice cloning
- Real-time inference
- Synchronisation across multiple complex librarries with different environments
- Culturally-aware translation
Accomplishments that we're proud of
Created a fully functional prototype that feels magical, speaking another language in your own face and voice! Integrated multiple complex AI components into a clean, easy-to-use interface. Made it fun, usable, and personal! true to the “Lord of the Lings” spirit \o/
What we learned
- User experience design is just as important as model performance when it comes to deploying AI tools.
- Combining multiple modalities (vision, audio, text, translation) requires careful synchronization and format handling.
- Building with tools like Gradio makes prototyping complex AI applications both fast and friendly.
What's next for LoL: Lord of the Lings
- Support for more languages and dialects, including code-switching scenarios.
- Adding culturally-ware translation (e.g idioms)
- Better voice cloning for consistent vocal identity across languages.
- Emotion-aware translation so tone and mood are preserved or adapted cross-culturally.
- Deploying a mobile-friendly version to expand accessibility.
Log in or sign up for Devpost to join the conversation.