Second demo video: https://youtu.be/OTF2gUl70E8
Inspiration
Improving your speech is pretty hard. Today, speech language pathologists (SLPs) have to sit down with their clients, record their speech, and after a therapy session, manually count and tag disfluencies found in their speech to find a diagnosis. Additionally, their clients (from young children, to adults) have no way of getting feedback on their progress between therapy sessions; clients are just given homework and told to come back next week!
For anyone trying to get better at speaking, instant feedback is important. That's why we've built Talker.
What it does
Speak better, live and in practice. Talker uses Google's live NLP Speech to Text API to analyze WPM (words per minute) from your speeches. If you start going too fast, Talker will vibrate and let you know; perfect for live speeches to an audience.
Get results. Once a speech is over, Talker will analyze it for filler words, interjections, blocks, and other disfluencies. Talker uses a machine learning model trained on Apple's newly released stuttering-events dataset, and will assign you a score based on how well you did, as well as list the number of speech disfluencies (interjections, repetitions, blocks) detected in your speech!
Connect with your SLP. Talker allows you to directly send your recordings and results to your speech language pathologist (SLP), allowing them to get feedback on how you’re progressing, and you feedback on how to continue improving—no appointment needed.
View your progress. Talker logs your completed voice recordings along with all the results analyzed from our machine learning model and Google NLP, allowing you to visualize your improvement with speeches as time goes on.
How we built it
We built Talker in Google's cross-platform mobile app framework, Flutter, after designing our UI on Figma. For our backend, we used Google Cloud Platform to connect our Flutter app to Flask, where we ran our machine-learning model.
Challenges we ran into
Machine Learning Model Prediction For our stuttering prediction, we went in with the ambitious goal of modifying existing neural net architectures and drafting up our own custom one to solve the apple stuttering ML dataset, which included several helpful properties about different speech disfluencies. However, when we tried to create our own model, we found we were only able to get accuracies of ~50%.
Deploying the ML Model We ran into several issues while deploying our model to the GCP App Engine. From trying to allow file write to having to increase the memory capabilities of the instance, it took quite a bit of debugging time in the final hours of the project development to create a working API endpoint.
WPM API It was difficult working with GCP's speech recognition tool to create a WPM monitor that was both fast and accurate.
Accomplishments that we're proud of
- Connecting a gated-recurrent DNN machine learning Tensorflow model running on Flask to the Google Cloud Platform
- Building a cleanly designed app in under 36 hours using Flutter
- Fleshing out a beautiful design in Flutter before building our app
- Working well as a team!
What we learned
- Building a neural net from scratch can be very hard
- Signal processing and MFCC feature extraction
- How to manage files and navigation in Flutter
- Google Speech to Text NLP API
- Not all datasets are made equal
- How to use Apple's SEP-28k stuttering dataset
What's next for Talker
- Add full direct-messaging functionality, letting clients directly talk with their speech therapists and send passages, recordings, results, and feedback back and forth.
Log in or sign up for Devpost to join the conversation.