EasyDubz

Inspiration

Just like Prince Zuko, I had lost my honor. Over the summer, I was very excited to share “Avatar the Last Airbender” with my grandma. I had found a Hindi dubbing and we were enjoying ourselves. When we got to season 3, we couldn’t find the Hindi dubbing! My grandma really enjoyed the show (even though she never admitted it), and I'm sure she felt similarly about not getting to know what would happen next. I had lost my honor, and the only way to regain it would be to create an AI-based dubbing software. I am glad to say that my honor has been restored!

What it does

We built a website that can upload a video file with captions. The website sends this file to the server, where the server communicates with the Google Cloud API to translate the audio and convert it to a usable form. The captions are parsed using a conversion algorithm which creates text chunks to be translated. The caption parser also uses timing calculations to add pauses in the speech using SSML. SSML is sent from the Google Translate API to the Google text-to-speech API, which converts the text into an audio file. The audio file's sampling rate is then adjusted to match the length of the original video. The adjusted audio file is sent back to the browser, where it can be watched.

How we built it

Our program works in three phases: First we translate the caption file into the desired language using Google Cloud’s Translation API. Then we add pauses in the translated speech text SSML notation. The conversion algorithm that accomplishes this was made from scratch. This SSML text is sent to Google Cloud’s text to speech API to create the dubbing. The audio file's sampling rate is then adjusted to match the length of the original video. The adjusted audio file is sent back to the browser, where it can be watched.

Challenges we ran into

During the process of creating an automatic dub, one problem that immediately presented itself was the general issue of timing. Specifically, how we take statements of varying lengths due to language, and have them match? Additionally, how do we utilize the Google Platform and languages such as SSML to accomplish these goals? By utilizing the power of SSML and various algorithms, we were able to speed and slow the dubbing without major changes to pitch and control. This combined with well timed breaks and pauses create a high quality dub of any subtitled video presented.

Accomplishments that we're proud of

Tackling Automatic speech dubbing was a daunting task. This was the most exciting idea we had, but we had a steep learning curve to over come; learning google cloud API calls, make a flask backend interact with a react frontend, and especially how to make the audio sync with the video. We are all very proud that we not only got to learn about the before mentioned topics, but also get a working product.

What we learned

While creating our automatic dubbing software, we learned how to use Google Cloud's API to translate and convert text to speech. We also learned about SSML and how to use it to create timed text to speech. We learned how to integrate all of these technologies using a Flask and React application.

What's next for EasyDubz

We want to find ways to quantify the emotional qualities of the original voice acting. By finding and tuning different variables, we can then apply the same characteristics to the translated speech waveforms. Furthermore, we want to be able to distinguish between different voices. Once we have the basics completed, we want to turn our sights towards creating and training Machine Learning models. We would train our models with native-language movies to find out how certain words are generally spoken, and how tone and emphasis is affected based on context. We can also use computer vision technologies and background score analysis to further understand the atmosphere of the scene.

Built With

bootstrap
flask
gcp
google-translate
python
react
text-to-speech
wave

Updates

Aditya Ojha posted an update — Oct 31, 2021 07:46 PM EDT

Want to dub your own video? You can make the SRT caption file using this website: https://clideo.com/editor/add-subtitles-to-video Select your video (from computer), and use the "Manually Generate" option to create the SRT file. You can upload both this and your video to our website. Happy Dubbing!

Log in or sign up for Devpost to join the conversation.

govindjoshi12 Joshi posted an update — Oct 31, 2021 01:46 PM EDT

We included the ATLA example in our video because it fit well with our theme, but please do try out the other video/subtitle in the backend/Caption_files folder. It is an excellent demonstration of our breaks algorithm!

Log in or sign up for Devpost to join the conversation.

govindjoshi12 Joshi started this project — Oct 31, 2021 12:56 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.