The Oscar winning Korean film 'Parasite' brought spotlight back to the old debate of subtitles vs dubbing. Machine learning has given a new twist to this, as it is now possible to make seamless dubs automatically, putting entirely new words in the same mouths. Our focus was on the entertainment track, and making something related to one of its largest issues seemed like the right thing to do.

What it does

Betterdub allows users to crop video segments, and given an audio file with speech, creates lip movements on that segment that match the new speech in just a few clicks. These segments are then overlayed onto the final video, effectively creating a dubbed version of the video, without the need for any human animators. It uses machine learning models and image processing to achieve a seamless dub. This means a dubbed video makes it look as if the person actually spoke the translated words.

How we built it

The desktop client was built with PyQt. The Python backend used open source machine learning libraries to process the video and a REST server deployed on a GCP virtual machine.

Challenges we ran into

We faced a lot of challenges. We formed the 3 member team just 2 days before the hackathon and remained a trio for the entire time. This led to an extra workload for each person. Apart from this, the idea we selected to implement led us to an unfamiliar tech stack.

We were new to the Google Cloud Platform, and it took quite a while to get things going. We experimented with multiple services to pick out the most efficient implementation. We used the Compute, App Engine and Cloud Run. This was also the first time we implemented a machine-learning model as a REST API. The learning curve was steep and tedious but useful.

Accomplishments that we're proud of

Despite being a 3 member team, we were able to come up with a MVP well within the deadline. We faced many issues with GCP but were able to circumvent these and move forward. We also made quite a few improvisations to our plan as time passed.

What we learned

We learnt that we can discover new use cases with enough brainstorming. Additionally, we learnt that flexibility is really important when trying to get a result in tight constraints.

What's next for BetterDub

  • Built-in translation and text-to-speech feature for automated dubbing
  • Improving render quality by using GFPGAN
  • Making a full fledged desktop interface
  • Robust cloud infrastructure

Built With

Share this project: