Voice4All

Inspiration

I have been on multiple phone calls with deaf individuals in the past, and I have always felt a strange disconnect to the deaf person because of the interpreter being the one to actually speak to me. What I like about talking on the phone with someone is the fact that it is a one on one conversation... feels personable, and I thought that deaf people might like that too.

As I researched this project, I was astonished at how little the software community has done for this demographic of people. Knowing that my project may be able to contribute to their enjoyment of life is what led me to pursue this project.

What it does

The receiver is able to hear the text-to-speech and respond accordingly.

How we built it

Voice4All utilizes convolutional neural networks and graph convolutional networks to train on video clips of American Sign Language. Once trained, the test-input is then taken from a web camera, and interpreted in real-time to words. The words are sent to a google firebase server.

The words are then converted to a text-to-speech over a VoIP phone call that was set up with SignalWire. The receiver of the phone call is able to hear the text-to-speech using SignalWire's SDK kit.

Challenges I ran into

Early on I had trouble with a dataset that I planned on using due to the company who owned it, 20bn.com, was acquired 3 weeks ago by a bigger company. Once they got acquired they removed their website and dataset.

I ended up going with a dataset named msas, and it was much smaller in size compared to 20bn but it was sufficient for me to be able to get a proof of concept.

The final machine learning model is still in need of fine-tuning, but this could also be due to my own ignorance of American Sign Language. I noticed that I needed to perform the signals with pretty good precision for the model, and rightfully so, I learned that a slight change in your signals can have a very different meaning.

Accomplishments that I am proud of

I was able to create a model that was performing at about 86% accuracy for the top 10 runs. This is very good considering the difficult data and testing environment the model had to work with.

What I learned

A lot of sign language, haha! I also learned some hard lessons on data preparation for visual data.

What's next for Voice4All

Cleaning up the model and acquiring more data. Also using different services for the phone calls due to the out of pocket expenses. I spent a whole $4 in one night!

Built With

Updates

Noah S. started this project — Oct 31, 2021 07:53 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.