Deaf-mute people in the U.S constitute a non-underestimable 2,1% of the population (between 18 and 64 years). This may seem a low percentage but, what if I told you that it is 4 million people? Just like the whole population of Panama!
Concerned about this, we first decided to contribute our grain of sand by developing a nice hack to ease their day to day. After some brainstorming we started working in a messaging aplication which would allow deaf-mute people to comunicate at a distance using their own language: the American Sign Language (ASL).
After some work on the design we realized that we could split our system in various modules to cover a much wider variety of applications
What it does and how we built it
So, what did we exactly do? We have implemented a set of modules dynamically programmed so that they can be piled up to perform a lot of functionalities. These modules are:
-Voice to text: Through Bing Speech API we have implemented a python class to record and get the text of a talking person. It communicates using HTTP posts. The content had to be binary audio.
-Text to voice: As before, with Bing Speech API, and Cognitive-Speech-TTS we have implemented a user-friendly python application to transform text into Cortana's Voice.
-ASL Alphabet to text: Through a (fully trained) deep neural network we have been able to transform images of signs made with our hands, to text in real time. For this, we had to create a dataset from scratch recording our own hands performing the signs. We also spent many hours training, but we got some good results, though!
Some technicalities: due to lack of time (DL training limitations!) we have only implemented up to 11 letters of the ASL alphabet. Also for better performance we have restricted our sign pictures to white background. Needless to say, with time, computer power and more data, this is totally upgradable to many ASL signs and not restricted background.
-Text to ASL Alphabet: To complement all the previous tools, we have developed a python GUI which displays a text translated to ASL Alphabet.
-Text to Text: We got to send text messages through AWS Databases with MySQL, thus allowing to comunicate at a distance between all this modules.
-Messaging: The first application we thought of. By combining all the previous modules, we got deaf people do communicate with his natural language. From sign language to text, from text through internet to the other computer and from text to voice!
-Learning: From audio to text and from text to sign language modules will allow us to learn to spell all the words we can say! Furthermore, we can practice our skills by going all the way around and make the machine tell us what are we saying.
Accomplishments that we're proud of
Getting all the different parts work together and actually perform well has been really satisfying.
What we learned
This is the first time for us implementing a full project based on Machine Learning from scratch, and we are proud of the results we have got. Also we have never worked before with Microsoft API's so we have get to work with this new environment.
What's next for Silent Voice
One idea that we had in mind from the begining was bringing everything to a more portable device such as a RaspberryPi or a Mobile phone. However, because of time limitations, we could not explore this path. It would be very useful to have our software in such devices and ease the day to day of deaf people (as was our first intention!).
Of course, we are conscious that the ASL is better spoken through words and not just as a concatenation of letters but because of its enormousness it was difficult for us embarking in such an adventure from scratch. A nice future work would be trying to implement a sort of a full ASL.
A funny way to explode our models would be implementing a Naruto ninja fight game, since they are based in hand signs too!