Note: this project won 5th place at at MatadorHacks
As our world moves toward an increasingly global society, people tend to find themselves having to learn more and more languages, just to be able to communicate. Learning a new language can be hard. Services like Duolingo or Rosetta Stone provide excellent sources for learning words or hearing pronunciations, but one of the most difficult parts of learning a language is learning to pronounce the words correctly.
What it does
The goal of VOCALaiZ is to make it easier for non-native speakers to learn the pronunciation of a word.
How we built it
The program takes in an audio file, sends it to the cloud, converts it to speech using ML, generates an audio file based on that text, and then compares the fingerprints of the input file and generated file. Then it determines a pronunciation score.
Challenges we ran into
When we first started out, we thought the project would be fairly straightforward. Recognize the words, then get the ML model's confidence that it was correct. That parameter should scale to the quality of a person's pronunciation. But no. Sadly, it was nowhere near that simple. The model didn't output data accurate enough to use in our code. So instead, we decided to compare every word that was spoken to a correctly pronounced sample. In doing so, we ran into some errors and complications involving the audio fingerprint. For example, the function doesn't take floats as timestamps, so if our audio bit is less than one second, it read it as 0 seconds and no audio file. The interactions with the Google Cloud API also took a while to figure out.
Accomplishments that we're proud of
Getting all of the different parts to work together was very challenging but we were able to get it to work.
What we learned
We learned how to implement neural networks, fingerprint audio and deal with binaries. Other useful skills learned include: dealing with expo audio streaming, interacting with cloud-based servers, and implementation of libraries such as ffmpeg, pyacoustid, pydub, beautiful soup, and more.
What's next for vocalaiz
In the future, we plan to add streaming capability, as well as other languages.