Speaking to people outside my immediate friend group is not something I'm very comfortable with so when I am presenting I often talk too fast and too quietly and rely on verbal crutches ("um", "ok", "so", "you know", "uh").

It transcribes the user's speech in real time to calculate their words-per-ten-seconds, their average volume, and percent of their words that are filler.

I used Android's SpeechRecognition API to transcribe the user's speech.

I was originally planning on using Google's cloud Speech-to-Text API but found that the Java library for that was incompatible with Android, forcing me to fall back on the significantly inferior SpeechRecognition library. This library is not designed for real-time transcription nor long, continuous speech so that has resulted in many bugs and a very hacked-together solution.

It more or less works, which I am pretty proud of given that I did the entire thing in less than 24 hours and it was a reasonably ambitious project.

I learned how to use the SpeechRecognition library to transcribe users' speech and how to parse out the results.

It would work a lot better using the cloud transcription engine, so next steps would be to transition it to that. This requires streaming the raw HTTP requests because the library for connecting to the cloud API does not work on Android. It also needs additional fine-tuning to make sure that the ranges I chose for speed, volume, and percent filler are accurate.

Built With

Share this project: