Inspiration

Deepfakes can be a concerning technology but why not look for a good use?

What it does

Translates a video to any language and lipsyncs the speaker to the audio.

How we built it

This was built using multiple pre-trained models for translation, lipsyncing, and voice generation and written in python.

Challenges we ran into

I initially wanted to offer this as a service with a web interface and dockerize each process however with the limited time I had to just create a CLI instead.

Accomplishments that we're proud of

The lipsyncing with provided audio source looks shockingly real :O

What we learned

Creating a deep-fake is shockingly easy and there is a bunch of open-source research on the topic which makes it easy to get started.

What's next for Multus lingua

Ideally, the voice generation would be trained on the original speaker's vocal data rather than a pre-trained model. The best future for Multus lingua would be partnering with a popular figure that has a large dataset of speeches. This way, we could have the speaker upload videos to our service and we could process them into any language to be distributed.

Built With

  • ffmpeg
  • https://github.com/rudrabha/wav2lip
  • https://github.com/tomiinek/multilingual-text-to-speech
  • https://github.com/tomiinek/wavernn
  • python
Share this project:

Updates