Print media is boring!
Is print media still a thing?
Who would read walls of text when the information fits into a 10 seconds TikTok video?
Videos get 1,200% more shares than text and images combined.
The print media industry is now researching, how to bring their printed newspapers into the digital century. - We think a step ahead and try to introduce them directly into a world where even digitized text is already old-fashioned.
Today is the era of the video!
What it does
We use state of the art AI technology to understand news articles and automatically transform them into short video clips which can be posted on social networks like TikTok or Instagram to make news attractive for younger generations again.
How we built it
We use Hugging Face Transformers to summarize the articles before they enter the audio-visual processing. Tacotron2 helps us to synthesize audio for certain speakers (as far as our models go) with given text. We then use Wav2Lip to synchronize the speaker's lips based on the synthesized audio.
The processing pipeline is integrated into a FastAPI backend that gets triggered from a React frontend. We have set up the stack with docker compose allowing us to easily integrate it into existing ecosystems. Unfortunately, beside having Azure credits, we did not have the time to deploy the stack and make the demo accessible for everyone.
Challenges we ran into
- Our models only work well with installed CUDA. Unfortunately, only one of our team members had a GPU-ready notebook. The others had to switch to the cloud for training purposes.
- FastAPI spawns multi-threaded apps. Torch creates directories when being instantiated leading to an attempt of creating these directories multiple times in a multi-threaded setting resulting in an error.
Accomplishments that we're proud of
- We implemented our own DeepFake model during one weekend that is capable of creating a real-looking video with synthesized audio and synchronized lips solely based on provided text.
- We built an end-to-end system utilizing our knowledge in frontend and backend development as well as machine learning.
What we learned
- Established libraries aren't necessarily safe from bugs.
- Package version management in Python can be a true nightmare (even though we weren't sleeping).
What's next for RealFakeNews
- Provide additional speaker models, also in other languages
- Integrate text translation
- Deploy the application
- Make the code more efficient (smells like some late night memory leaks here :D)
- Allow user to create own models (we implemented some fine-tuning script that has to be integrated) within minutes