Inspiration

I always loved messing around with different machine learning projects, and wanted to challenge myself for this hackathon. Since I haven't really worked with sound generation in a while, I figured I'd give it another go.

What it does

This project takes a silent video of someone speaking, focuses on the lips of the person talking, and attempts to reconstruct the sound that person was making. This way the program acts as a lip reader, but instead of producing text, it directly generates the sound.

How we built it

I first started with collecting data to train the machine learning model. I did this by using an existing AI to segment a person's face in real time, then preprocessed each video frame to be sent to a virtual camera. Next, I used the program OBS to record using the virtual camera, and collected training data of myself talking. Finally, the training data is processed for the machine learning model (both video and audio was split into segments with a specific length, and the audio segments were converted to spectrograms using an open source software called ARSS).

My partner, using HTML, and CSS, created a frontend for the product. Using flask as a framework, the website would take in video and send it to the backend code to be processed and fed to the ML model to attempt to recreate the sound that person made.

Challenges we ran into

Training the model was extremely difficult. I have some experience with training models from scratch, but the size of data made it extremely difficult to train the model. We ended up having to use the cloud to train our model.

Accomplishments that we're proud of

I am extremely proud of the data collection for this project. Although the model didn't end up working as expected, the way that the data was collected, processed, and stored was especially fun to figure out, and I'm happy with the results.

What we learned

We learned the benefits of keeping things simple, along with how to use the cloud to accomplish ML tasks.

What's next for Hermes

I'm definitely going to continue working on this project outside of the hackathon, as it's been sitting on my to do list for a while now.

Built With

Share this project:

Updates