Inspiration
Mel-AI was born from a vision to demolish language barriers and democratize access to global content. In our increasingly connected world, language should never be an obstacle to enjoying and understanding diverse media. Consider the passionate sports fan in rural China who dreams of understanding the nuanced commentary of an NFL game, or the Brazilian basketball enthusiast yearning to grasp the strategic insights shared during NBA playoffs. These scenarios inspired us to create a solution that transcends mere translation – we're crafting an immersive experience.
What it does
It transcribes, translates, and provides voiceovers for the videos on-the-fly*, allowing non-English speakers to enjoy global content in their preferred language.
How we built it
We used Twelve Labs Indexing to index the upload videos. Generate transcriptions from the video using the Transcription end point. We parse the transcription text to Open AI model to generate a translation to the provided text in the user chosen language. Next, we take this translated captions and pass it to the OpenAI's Text-to-Speech model to generate an audio file. Then, finally we take this audio file, and use the ffmpeg to overlay the audio on top the original uploaded video to make it close to a dubbed experience.
Challenges we ran into
Plenty, we had a different idea that we were working on and we had to pivot because of an impending roadblock and there is no way to bypass that issue because of the security reasons. Hence, we had little time to explore the twelve labs, brainstorm and come up with this idea. Running into token limits with the text to speech model when working with large video files with large captions. Figuring out proxies, and getting around CORS errors. FFMPEG!! - Still figuring out how to seamlessly trigger it from our application.
Accomplishments that we're proud of
Getting to where we are since 11 PM, we are immensely proud of what we have accomplished.
What we learned
We learnt a ton about Twelve Labs API. WOW it's freaking insane. So powerful! Learnt how to use Text to speech models. Learnt new stuff about video, audio.
What's next for Mel-AI
Would definitely want to make it seamless. Weed out the audio less parts (music portions) in a video, and then map those points properly with the VoiceOver to make it a more smoother experience.
Built With
- flask
- kindo
- openai-models
- python
- react
- text-to-speech
- twelvelabs-api
- typescript
Log in or sign up for Devpost to join the conversation.