Our voice is one of our most powerful tools. We can share big ideas, thoughts, and even speak with people across the world by merely speaking with one another. However, some of us aren’t able to use our voices to carry out these essential tasks. I realized that people who use sign language as their primary means of communication could be given a method to talk with those who don’t know the complicated gestures behind this language.
What it does
The Sign Language Translator is a website where hand gestures can be converted into audio and text over a real-time video call.
How I built it
I was able to use TensorFlow’s k-nearest neighbour's algorithm or kNN to train and predict words from a live video. Put merely, kNN classifies data points from a frame by checking the labels of “k” points closest to them. Whichever label is in the highest concentration is the one that the algorithm returns.
Users can train, retrain, or delete as many words or alphabets as they like. Once gestures are trained using this model, the translation can begin. All the gestures between a user’s start and stop gestures are classified and used to form sentences. These sentences are then converted to Computer Speech and outputted through the computer’s speaker.
I also integrated video call functionality that relays the output speech and text from the user’s hand gestures to another caller’s interface. Now a person can talk with someone who doesn’t even know sign language, in a way that is natural for both sides.
Challenges I ran into
The logic behind when the hand gesture prediction should start and stop as well as how to store, retrain, and delete the training data were exciting challenges that I overcame.
Accomplishments that I'm proud of
I'm proud of the fact that I was able to use TensorFlow not just to optimize or simplify a task but to address a problem that many people around the world are struggling with. Replacing a human translator so that vocally-impaired people can speak, with confidence, for themselves is something that I am delighted to have achieved.
What I learned
I learned a lot about Computer Vision and Web Applications. From learning how to extract frames from a live video and adding them to training data to finding the right classifiers for my specific application, I truly delved into machine learning with this project. Also, designing a web interface where users could easily understand the training, prediction, and video call process was an excellent learning experience.
What's next for Sign Language Translator
With a diverse dataset and hand isolation, in the future, the user will no longer have to provide custom training data to get started. Perhaps the most exciting aspect of this project is its potential for widespread impact. Integrating the Sign Language Translator as a feature in video call services like Google Duo will allow the technology to help thousands of people around the world.