All of us wanted to explore google cloud platform & machine learning tools, as we had never worked extensively with either of these things. We decided to take machine learning to video and see how we could apply available resources.
What it does
As you are livestreaming (or maybe even FaceTiming) from your computer, your words will be captioned in real-time. This provides potential accessibility to hard of hearing users or users with different native languages (where we could apply a quick translation on the captions). Such a tool serves to enhance virtual communication.
How we built it
We used the opencv package in python in order to capture the user's video directly from their camera. From there, we used the sounddevice package to extract audio in realtime. Finally, we decided to apply the google cloud platform speech to text API on every two-five seconds of audio in order to display the translations on the video.
Challenges we ran into
Our biggest challenge was trying to figure out the extraction of audio in realtime; we went through numerous different python packages for audio, but a lot of them either didn't properly install or had underflow/overflow errors. Finally, we were able to gain success using the sounddevice package.
Our other biggest challenge was installation of the google cloud platform- the authorization was not working for a really long time.
Accomplishments that we're proud of
We're proud that we were actually able to make a deliverable application that kind of worked (albeit laggy). We were worried that we wouldn't get ANYTHING for a while.
What we learned
We learned about how to use/authorize google cloud platform, and how to go about sending the permissions for capturing video and microphones. It was really valuable for us, because we never really had experience working with these kind of tools.
What's next for LiveStreamCap
We hope to make it less laggy and to be able to display the text in a slightly prettier fashion.