Readme: The video only needs to be judged until 2 minutes. The remaining time are for additional bloopers :)
More than 3 in every 4 people in the world do not speak a single word of English. Yet, over 50% of the content on the internet is in English, over 80% of online educational courses are in English and the majority of online meetup events like hackathons or conferences are in English.
There is a digital language divide that leaves behind more than 75% of people who do not speak, read or write in English.
What it does
Audio and visual translation for videos and video calls with lip and expression sync to allow anyone to understand and enjoy video content on the internet in their own native mother tongue that they are most familiar and comfortable with.
How I built it
We used the IBM cloud's translation API for speech to speech translation from one language to another. In addition to that, we also do visual translation by using deepfakes to ensure lip and expression synchronisation.
Challenges I ran into
Lip and expression synchronisation using GANs still has a few artifacts that needs much more hyperparameter tuning to make it more visually realistic in high definition.
Accomplishments that I'm proud of
Spreading Andrew Ng's machine learning course to everyone in any language :)