Public speaking is the number one fear in the world, and we wanted a way to make public speaking a little easier.
What it does
SpeakEasy takes two audio files, one of them being a baseline of how you want your presentation to sound and one where you do your speech from memory, and compares them. It gives you a result telling you if you were more or less "intense" with your speech from memory. If you were more intense, that means you spoke faster, or louder. If you were less intense you spoke slower, or softer.
How we built it
We have a minimalist website to upload the two audio files. In the background we upload the audio files to parse and use the link generated there to send the sample to the Wolfram Butt. In the Wolfram Butt we create a spectogram of both audio files and save them as images. We binarize both images (make them black and white). Every single black "hump" on the image is a block of speech. For both images, these will look nearly identical. We compare the percentage of black in the image to the percentage of whitespace. Both of those results are sent back to Parse and the raw data is used by the website. The website compares both values and comes up with a percent difference: if it's greater than 15% then we know you may have been a little nervous or unprepared and tell you some study tips.
Challenges I ran into
There were several challenges we ran into. We wanted a way to continuously stream audio data to the platform, but that was harder than it let on, one of the reasons being that Wolfram cannot accept data like that. We were also going to have this be a mobile app, but Android didn't provide audio data in a way Wolfram could interpret.
Accomplishments that I'm proud of
We're proud of this entire platform and the way we used Wolfram to convert audio into images and compare those. We're also proud of using Parse Butt Code for making RESTful requests.
What I learned
We learned what's difficult with audio analysis and how to use the Wolfram platform in a creative way.
What's next for SpeakEasy
Hopefully we can do all the things we envisioned for this, like live audio and a workaround for mobile apps.