Throughout our educational career, we have taken part in several presentations and understand the nuances involved with effectively pitching ideas. One of our family members had a speech impediment and inspired us to find a solution that could improve their presentations with personalized feedback. Our solution was InSpeech! The primary consumers for InSpeech are high school students, university students, and white-collar employees. A recent study by Harvard found that annually students will give an average of 15 presentations while white-collar employees give an average of 87 presentations annually. With such a large number of presentations being given by students and employees InSpeech is a necessity. Through InSpeech, individuals of all ages will be able to enhance their presentations and effectively pitch their ideas and solutions.
What it does
Our app is built to bolster the confidence of individuals in their presentations or speech. The present page allows you to record a speech, and after, the app analyzes the duration, speaking tone, volume, pace, and frequency of filler words. Once you save your presentation you can review a transcript of the speech and all the analytics from the dashboard. Our app’s second page is a practice page, which essentially acts as a teleprompter. You choose a desired amount of time for your speech and select an image with your script or notes. The app converts the image to text via an Optical Character Recognition (OCR) model and displays the script on the screen.
How we built it
We utilized Flutter with a firebase backend to personalize the application to have a smooth user interface and authorization system. Additionally, we used Web Speech API to detect a user’s speaking tone, volume, pace, and frequency of filler words all from the voice data of their recorded speech. We also created an Optical Character Recognition model that converts an image of the user’s script to text using a FirebaseML kit.
Challenges we ran into
We weren’t sure how to go about solving the problem of enhancing presentations and speeches. For individuals with speech impediments or other conditions that make speaking difficult, we didn’t know what specific vocal traits we would need to focus on. After research, we eventually determined three key characteristics to delivering an effective speech: consistent words per minute spoken, loud volume, and a confident tone. We then had to find an API that could extract all these things from voice data, and we eventually found Web Speech API.
Accomplishments that we're proud of
We’re most proud of our speech analysis integration as it took the most effort to integrate the Web Speech API to accurately analyze the voice of the user. We were able to accomplish this and generate recommendations based on their speech analysis within the given time limit, a feat we are extremely proud of achieving. We were also proud of how we used natural language processing to determine filler words in the user’s speech. This can be extremely beneficial in keeping the user’s speech concise and direct.
What we learned
In this app, we learned how to integrate machine learning libraries into the Flutter User Interface. For example, detecting filler words required natural language processing, and integrating it into the app was not as simple as we once thought. In displaying the transcript of a user’s recorded speech, we learned how to turn all filler words to a certain color. To accomplish this, we had to take output data from the machine learning model and reference it in order to turn the correct words into different colors.
What's next for InSpeech
We plan on monetizing InSpeech by creating a subscription model. Additionally, we will be including a real-time facial detection API that will help users maintain facial expressions and eye contact. This will improve audience engagement and optimize the user’s presentation rather than just their speech.