Inspiration
When coming up with a project idea, we saw two main problems:
- Healthcare is not user-friendly, especially with respect to certain demographics: dyslexics, visually-impaired, the elderly, etc.
- Lack of testing and availability of tests for COVID-19
These problems compelled us to provide an alternative, novel risk assessment tool that would solve the user-friendly/personalized problem and the testing problem.
What it does
As part of a more general tool to read custom voices from any webpage, we built a seamless, fast, easy to use, interface using a WHO-certified COVID-19 risk assessment test that is personalized and user-friendly. To demonstrate its multifaceted capabilities, we demonstrate Donald Trump taking your diagnostic tests with Obama reporting your results. While this component of the project is purely for comedic and demonstration purposes, we note that any voice could be used, which means that we are one step closer to a more personalized healthcare experience for patients.
How we built it
We used cutting-edge deep learning techniques, and used a deepfake library (https://github.com/CorentinJ/Real-Time-Voice-Cloning) to provide voice generation capabilities. Furthermore, we streamlined a data pipeline for fast detection of on-screen text and speech generation. Our web server functionality was developed using Javascript, HTML, CSS. Finally, we ensured that our test complied with WHO guidelines by using a publicly-available diagnostic test workflow provided by Infermedica's (https://infermedica.com/) APIs.
For deepfakes, we utilize a multi-network model trained using transfer learning on only hours of voice data. Initially, the embedding network generates a low-dimensional fixed-sized embedding for a given short voice waveform sample; this embedding is essentially a “voice profile.” Then, we use this embedding with a seq2seq synthesis network to generate a mel spectrogram conditioned on the text (obtained via our screen reading code). Finally, we use a Wave-RNN vocoder to convert this mel spectrogram into the time domain, generating a complete deepfake waveform. Then, this .wav file is played in real-time to dictate screen text.
Challenges we ran into
We ran into many technical challenges. One of the most difficult challenges was simply connecting together all of the components: Infermedica APIs, the deepfake library, and a web server to handle the text reading and processing.
Accomplishments that we're proud of
We're proud of having built a piece of software that could potentially be used in many other contexts, providing a more user-friendly interface in the healthcare space.
What we learned
We learned about web development, some machine learning, and perseverance in the hopes of a final product.
What's next for CovidFlow
The vision of CovidFlow is to extend our initial project into a company focused on user-friendly healthcare. The last thing that a patient wants when he/she is experiencing intense pain is to go through an unpleasant experience. We seek to utilize our backgrounds in technical and biological fields to realize user-friendly solutions to currently inefficient processes, and this is our first step.
More short-term, we would like to enhance the technology by being able to handle large webpages, increasing time efficiency (by parallel processing and server-side GPU processing).
Built With
- css
- html
- javascript
- python
- pytorch
- tensorflow
Log in or sign up for Devpost to join the conversation.