Speech Detection Using Google Cloud

Inspiration

When thinking about ideas for the hackathon, we were interested in language models and languages in general. We wanted to have a project that was neither easy nor too hard and something that we could build upon even after the hackathon. One of our members was interested in traveling, thus we came up with this idea, the idea to use a translation service to compile proper responses in native tongue to allow travelers to speak fluently with people form the visiting country. We were inspired by the feature within most google browser where you can speak into the mic instead of directly typing your query into the keyboard. We hoped to emulate that with additional translation support.

What it does

This project detects code using Google Cloud and (speech-to-text) and contains a translation system to a targeted language.

How we built it

We first looked into the associated python libraries that could help us interpret and recognize the speech. We then created a simple web app using html and bootstrap, using our python script paired with the flask web framework as our backend. We learned a lot about internet protocols and asynchronous functions along the way. We also implemented a translate function that can translate the text into Spanish but we look to add multi-lingual support in the future.

Challenges we ran into

The main problem we faced was in the form of 404 File Not Found errors. These errors would arise when we attempted to convert a webm to a wav file, as the media recorder we used created the webm files and the speech to text portion converted those webm files into wav files (which were then converted into the written text. We determined the cause of this due to a miscommunication between out js file and our python file where the file would not be recorded in the first place as well as a separate error relating to our file reading system.

Accomplishments that we're proud of

We were proud to have properly set up the recording device to record a voice audio and a interpreting system to interpret such audio into text.

What we learned

We learned the basics of the speech recognition library as well as understanding how to properly set up a flask server to use in local development. We also learned about some file reading and how audio translation works from the speaker on a computer.

What's next for Speech Detection Using Google Cloud

We want to create a translation system that takes our converted audio recording from one language, translates that language into a native language (that you know), and compiles a response in the native language using AI to properly communicate in another language without knowing the language. Alas we feel this would be helpful in traveling and connecting cultures all over the world and making the language barrier non existent.

Built With

Updates

Bryson Matsuda started this project — Mar 24, 2024 11:53 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.