Inspiration

Inspired by the struggles of our online classes last year, we wanted to create a solution to the embarrassing problem that most of us have faced during many Zoom calls and Google Meets. We aimed to build a bot that could detect questions directed toward you, find an appropriate response, and output that result for you in such a manner that your employer/teacher would never be the wiser about your much-needed slumber during the meeting.

What it does

Helpline listens for any questions or phrases you may have to respond to by using Google's Speech-to-Text API to listen for your name and the question/phrase following it. It then uses that phrase in a HTTP GET Request to Google, which web-scrapes Google's recommended answer from the search page and plays it back through the call microphone.

How we built it

Using Google Cloud Speech to Text, we transcribed .mp3 audio files (from the call) to our Flask server. The server then sends an HTTP GET request to Google and scrapes the results from the search webpage, obtaining the response from the answer text that Google displays. The HTML response is parsed by BeautifulSoup, which returns an answer text. This answer text is then sent back to our local computer and played through a speaker that simulates a voice response that plays back through the call.

Challenges we ran into

Voice recognition was far harder than we initially expected. However, through extensive Googling and trial and error, we managed to find a method and API that worked.

Accomplishments that we're proud of

We're proud of managing to set up a functioning local server that scrapes response text from Google and reads it back into our local host; this is something we had no clue how to do coming in. Another notable accomplishment is our success in managing to convert speech to text after many failed attempts in trying to do so.

What we learned

We learned how to use Google Cloud Speech to Text, a lot of HTML-related concepts and elements of OpenCV from elements of projects that were later discarded.

What's next for Helpline

We wish to make the process more efficient by finding a way to constantly read in audio from Zoom or other conference calls to our script because as of right now it can only utilize pre-made .mp3 files, which is not practical in a realistic situation. In the future, we would also like to implement other features such as mute/unmute functionality and utilizing an AI-generated human image to simulate your camera being on and speaking the response.

Share this project:

Updates