We made MasterSpeech to help anyone who struggles with or wants to practice their speaking skills. We felt that this was something that everyone needs to practice, whether you're a student giving oral presentations, a candidate seeking to improve a speech, or even a salesperson working on a pitch.

How it works

MasterSpeech runs entirely client-side in Javascript. It uses Google's online speech recognition software, then parses the results and displays an analysis of the speech. Think of it as an "enhanced" version of recording yourself speaking and playing it back -- while we do offer that feature so you can evaluate yourself, we also take non-obvious quantitative data, such as spoken words per minute, to give further insight into where your weaknesses may lie.

Challenges I ran into

We spent half of the first day trying to find a suitable speech processing library. Originally we planned on doing it locally on the backend, but the library we tried for that (CMU Sphinx) gave unusably bad results. We quickly tried out several other libraries, but we found that ultimately the library that worked the best was a wrapper around Google's online speech recognition. We realized that we could reimplement this library entirely on the client side in Javascript, eliminating the need for any dynamic content entirely.

Additionally, we had hoped that speech-to-text APIs would transcribe everything spoken, including filler words like "um" and "uh". This way, users could also receive a transcript of what they had said and be able to see where they often used these words. Unfortunately, it seems that these words are often filtered out and automatically disregarded by the software.

Accomplishments that we're proud of

  • Finding and using a suitable API for speech recognition
  • First hackathon for half of our team
  • Minified, entirely static and cacheable version (/min.html) that is fully functional under 5kb
  • Good looking website for people who have never done any web design before

What's next for Master Speech

  • More statistics like pause duration, vocabulary variation, tracking along with a pre-written speech, pitch variation, etc.
  • Written transcript of what you said, filler words and all
  • Saving recorded audio to playback and analyze later - talk to your phone in the car or when you have free time, then read over your stats later
  • Twilio integration - simply call a number to have your speech recorded, transcribed, and saved to be accessed later online through an account
Share this project: