VOCALaiZ

Note: this project won 5th place at at MatadorHacks

Inspiration

As our world moves toward an increasingly global society, people tend to find themselves having to learn more and more languages, just to be able to communicate. Learning a new language can be hard. Services like Duolingo or Rosetta Stone provide excellent sources for learning words or hearing pronunciations, but one of the most difficult parts of learning a language is learning to pronounce the words correctly.

What it does

The goal of VOCALaiZ is to make it easier for non-native speakers to learn the pronunciation of a word.

How we built it

The program takes in an audio file, sends it to the cloud, converts it to speech using ML, generates an audio file based on that text, and then compares the fingerprints of the input file and generated file. Then it determines a pronunciation score.

Challenges we ran into

When we first started out, we thought the project would be fairly straightforward. Recognize the words, then get the ML model's confidence that it was correct. That parameter should scale to the quality of a person's pronunciation. But no. Sadly, it was nowhere near that simple. The model didn't output data accurate enough to use in our code. So instead, we decided to compare every word that was spoken to a correctly pronounced sample. In doing so, we ran into some errors and complications involving the audio fingerprint. For example, the function doesn't take floats as timestamps, so if our audio bit is less than one second, it read it as 0 seconds and no audio file. The interactions with the Google Cloud API also took a while to figure out.

Accomplishments that we're proud of

Getting all of the different parts to work together was very challenging but we were able to get it to work.

What we learned

We learned how to implement neural networks, fingerprint audio and deal with binaries. Other useful skills learned include: dealing with expo audio streaming, interacting with cloud-based servers, and implementation of libraries such as ffmpeg, pyacoustid, pydub, beautiful soup, and more.

What's next for vocalaiz

In the future, we plan to add streaming capability, as well as other languages.

Built With

audio
expo.io
fingerprint
google-speech-to-text
machine-learning
ml
node.js
python
react
react-native

Submitted to

MatadorHacks
- Winner 4th - 10th Place

Created by

I made the entire front-end using React Native and Expo. The app takes Microphone input and sends it to the server for processing. It also handles the response back from the server.
I also helped my teammates with the backend quite a lot.

Miguel Tenant de La Tour
random french coder kid living in california
Built Flask API and architected backend algorithm flow. Also worked on data cleaning for ML model and server handling.

Rohan Pandey
AI Researcher & 10x Hackathon Winner
I used Python and multiple libraries in order to make web scraping scripts and programs that edit audio files

Ansh Gupta
Sid Kannan

Updates

Sid Kannan started this project — Apr 07, 2019 11:58 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.