As people who love to be involved in our communities, we realize that the ability to make a speech is a valuable skill in modern times. For activities such as Model UN, and even just for talking to people, you have to be able to say what you want clearly. In other words, you have to be a strong orator. This inspired us to do something that allows you to become a better speaker in real-time-- it’s a skill we all need.
What it does
The program begins for the user once they open up our webpage. They are prompted to record information, at which point they receive the live preview of what they are saying, and we take this input, fed life through a Google server connection, and send it to two places: first, to an API endpoint we created today for machine learning analysis to figure out who is most similar to your mannerisms (this model we just trained today); and second, to IBM Watson for emotion and sentiment analysis; The user is provided with the analysis of their speech, including the amount of sentiment (good speeches, of course, have high sentiment), and the person they are most similar to. They can also see a transcript come up on-screen of what they said.
How we built it
The front end is comprised of a React app, with data tunneled through ngrok. The machine-learning algorithm was coded in Python on Jupyter Notebook, and it works with a very interesting method of generating spectrograms of the speakers (both the ones in our dataset and the user input), and then it uses ML to compare the two. The generated transcript is stored in a database after it is created. That model has an API endpoint that is accessed, one of two python servers that are accessed (the other works for machine learning.
Challenges we ran into
Because this was such a technically complex problem, one of the largest problems was that of the integration of the different parts. Normalization of the spectrograms generated by the machine learning algorithm made it hard to get a very highly accurate machine learning model. This was a struggle because the dataset that we used was the VoxCeleb database, with people from Aaron Rodgers to Donald Trump Jr. Each clip was of a different length, and thus the spectrograms did not scale correctly at first and had to be normalized. Another large issue was the use of the Keras functional API because we had multiple form inputs we had to utilize the functional API to get the four inputs that we needed.
Accomplishments that we're proud of
We are super proud of the fact that we got the integration of all these moving parts together and that the UI ended up looking nice. As a team, we place a high emphasis on design and aesthetics, and so we enjoyed making the nice logo and getting the chance to make multiple pages on our webpage. On the topic of smaller details, we also really appreciated being able to include authentication and making a secure connection to our webpage. We are also proud of the fact that this runs on any device that has chrome, testing on our phones gave a convenience factor that was useful for us, especially considering we have differing laptop makes and models.
What we learned
One of our team members had never used to React before, so the use of React for a UI was a challenging but ultimately rewarding experience. We saw how powerful React is due to its component-based approach to coding, and we also found out how to integrate our front end with our back end using different tunneling mechanisms. Ngrok, in particular, was very useful for us to compartmentalize our code and to have the parts be interconnected and reliant on each other. At a high level, we also found it very interesting to approach speech comparison as image comparison, this idea we haven't explored before but was very interesting as an approach to handling non-homogeneous data.
What's next for Oratr
We want to increase the functionality of Oratr. For one, giving speeches isn’t just about being able to deliver them, but also the ability to memorize them. For that reason, we would like to be able to create a feature that allows users to easily memorize their speeches, perhaps through a repetitive process such as learning when a speech does not match. Of course, we also hope to publish this to the web and deploy the app somehow. We could potentially use Heroku for this deployment, but we want everyone to be able to use Oratr and enjoy using it as much as we do.