I talked around with my friends who have their background in rural area. With technology not running in the depths of my country (India), it is tough for students to attend lectures. The teachers are also not well-versed enough to set up online lectures. Hence students struggle with going through tediously long texts! Also everyone had a common view that video lectures help a lot, but teachers are not free enough to do that! I thought I should do something about and here we are :D
What it does
It takes in an audio or a text file and generates a video of the content based on the image/video of person who wants to convey the content is provided. To put it simply, I am using deep-fakes to bridge educational gaps. A notable point is that it even works very well with a single image and hence can easily be used even in rural areas.
How we built it
- I first started with implementing a deep fake research paper. Used tf for this.
- Then I created a flask app which could take in file uploads.
- The next step was to build a good UI around it! UI is imp afterall :P
- The next logical step was to use a Text-to-speech(TTS) service to first convert text to audio and then use that audio to create deep-fake. I used google's TTS service due to time constraints
- The next step was to join the flask app with deep fake.
- The flask app was hosted using ngrock's http tunnels.
- The whole structure could be run and hosted in google colab, hence the end users need not be techosavvy to do that!
Challenges we ran into
- One of the major challenges was to find a well working deep-fake system that would work on minimalistic inputs like an image or a text. Rural internet restrictions were kept in mind while building it!
- It was hard to host the app completely on colab, because there was an actual web app that was involved! Went through some blogs and figured it out. Now it can be run end to end in colab
Accomplishments that we're proud of
- Getting awesome results with a single image and audio. In general getting indistinguishable results between a normal person taking a lecture and the deep-fake created!
What we learned
- I learnt a lot in terms of handling file uploads. Being from ML/AI background, deep fakes was a known domain, but discovered great features of colab tbh :P
- Learnt to work in a structured way and in a timely fashion
What's next for FAKEducation
- With deep fake models becoming more and more sophisticated, a model with expressions should be looked into. It can be done, but due to time constraints it was hard to do.
- A better Text-to-Speech model should be looked into, one which includes voice modulations, accents and gender. It is currently available as a paid service and hence can be upgraded to with minimal investment
- As 3D models can actually be Deep-Faked, AR can be looked as a future feature!
- Should be made more lightweight for ease-of-use of mobile users...