Inspiration
When brainstorming ideas, we noticed that what we all weren't worried about wasn't the idea itself; it was presenting it. We all know that many people, including ourselves, struggle with public speaking, so we decided to make a model that can help people train their speech.
What it does
The user runs the program and the recording starts. The program records the audio of the computer for a set duration of time and saves it as a .mp4 file. We use this audio as our reference point for where we get our feedback from. We are giving feedback on: the number of filler words used, the rate at which the user is talking, and the tone that the speech gives off. The user is then given the feedback based on our algorithm.
How we built it
A large majority of our work was done through the Google Collaboratory Notebook, or "CoLab" for short. We used Python as our source of backend. Our biggest help was using the Speech to Text CloudFlare API. After we got the text, we imported a library that had filler phrases. We iterated through the words with a sliding window of max size 6 words, as that is the number of words the largest phrase in the library is. We then compared the number of filler phrases with the number of words spoken so we could return the percentage of speech that is filler. For the rate, we got the duration of the audio using WorkersAI. Using this information with the amount of words spoken, we found words per minute. For the tone, we used CloudFlare's Sentiment Analysis. For every word, it assigns a positive or negative value with a predetermined magnitude depending on how positive or negative the speech is. The more positive the answer is, more positive in tone it is.
Challenges we ran into
The biggest challenge we ran into was connecting our front-end prototype with our backend. Our front-end was getting data using JavaScript and our back-end currently is exclusively in Python. Our goal was to set up a local server, send the data recovered in the front-end as a .json file, and then retrieve that data from the back-end. We tried HTTP and AJAX requests, but none of it was working. This caused us to completely scrap the front-end implementation and focus on doing it completely in Python.
Accomplishments that we're proud of
We're proud we were able take the .mp4 file and analyze it. In order to create a method to count the words per minute spoken, we had to calculate periods where the word count had dramatically decreased. We could then
What we learned
Many of us haven't worked with CloudFlare or any APIs in general. Figuring out how to connect to it was a nice task to figure out. We also had to find a way to translate the data that we got into useful information. We had to figure out good criteria for how we want to scale what we return to the user. A lot of us are also beginner Python users; it was nice to learn how to communicate our ideas into the language and accomplish our task.
What's next for speech.ai
Our next step is to go down one of three paths: create a website for this application, create a website extension, or make a pip install package out of our idea. The main goal for all of these is to make our project more accessible for anyone who would like to use it. We would also like to improve the accuracy of the sentiment analysis by training the model with either our own data or that is already out there. On the backend side, we could probably find a way to make it so that the duration of the recording isn't predetermined before you start recording. We could implement some way to make the duration solely dependent on when "Record" and "Stop" are clicked.
Built With
- cloudflare
- googlecolab
- python
- workersai
Log in or sign up for Devpost to join the conversation.