Bumble-B

Inspiration

Our main inspiration came from how Bumblebee speaks in the Transformers movies, hence "Bumble-B". In the movie, Bumblebee uses the car's radio to speak what he wants to say by splicing audio segments together to form sentences.

What it does

We decided to build an application that does exactly that: pulls audio together from many different online sources and splices them together to form a sentence. A user can type in a desired phrase, and Bumble-B will search through a database word index, find audio clips of each word, and then splice them together at the end.

How we built it

Our main focus was to utilize Google's Speech-to-Text engine to quickly index audio files. This can be done by requesting one of Speech-to-Text's optional parameters, enable_word_time_offsets , which sends the start and end times of each word that Speech-to-Text recognized. This allowed us to automate the process of splicing large audio files, such as an Obama speech, by allowing Google's Speech-to-Text to analyze each word and find the start and stop times, and simple python code to split and upload each individual word to Google Cloud Storage. Every word is then indexed in a Django RESTful api in order to easily query requests on the front end and pull the relevant audio snippets.

Challenges we ran into

The main challenge we faced and didn't really foresee was the accuracy of Google's Speech-to-Text word time offsets being pretty inaccurate. The Start and Stop times for each word round to the nearest 10th of a second, which is actually a very long time when speaking at a normal speed. This means that instead of getting a single, cleanly cut word, Speech-to-Text would often give back correct speech values, but cause our app to split the file into too long of a clip. For instance, We indexed the "Hi, welcome to Chili's!" vine, and while Speech-to-text accurately named all of the words, the Start and Stop times caused the "welcome" word to also include "hi" and "to" in the trimmed clip due to the large Start and Stop time rounding.

Accomplishments that we're proud of

While our auto-indexing system using Google's Speech-to-Text doesn't produce incredible results due to the inaccuracy of the Start time and End time responses, we're proud as a team that we ventured into unknown territory and used a framework that none of us had experience with. Likewise, we're quite proud of how front-end and back-end frameworks came together quickly and effectively.

Built With

angular-2
django
docker
google-cloud
google-cloud-speech
kubernetes
postgresql
python

Submitted to

HackMIT 2018

Created by

Helped build the RESTful api created in Django, built system to auto-index the created audio snippets in the Django Database, worked on optimizing start and end times of audio snippets

Andrew Parsons
I containerized the apps and deployed through kubernetes engine on google cloud platform. I also created the python script that continually listens to a gcs bucket for new audio files to submit to the speech api and break into word sized sound bites to be stored in gcs and indexed in postgresql for querying.

Mark Hudson
Built Backend REST API, audio upload scripts, and designed and built angular front end

Conner Chyung

Updates

Andrew Parsons started this project — Sep 16, 2018 09:59 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.