In today's world, technology has made it possible for people from various backgrounds and cultures to interact and understand each other through various cross-cultural and cross-linguistic platforms. Spoken language is a much smaller barrier than it was a few decades ago. But there still remains a community amongst us with whom a majority of us can't communicate face-to-face due to our lack of knowledge of their mode of communication. What makes their language different from any other is that their speech isn't spoken, it is shown.
This is particularly pronounced in the education domain for students and educators in this domain can feel isolated in mixed learning environments and this project hopes that through it, they are able to better communicate and integrate with the world around them.
What it does
Our contribution is Talk To The Hand — a web application that helps hearing impaired people share their message with the world, even if they don't have the physical or financial access to an interpreter. Sign language speakers open the application and sign into their computer’s camera. Talk To The Hand uses computer vision and machine learning to interpret their message, transmit the content to their audience via voice assistant, and create a written transcript for visual confirmation of the translation. After the user is done speaking, Talk To The Hand gives the opportunity to share the written transcript of their talk through email, text message, or link.
We imagine that this tool will be especially helpful and powerful in public speaking settings — not unlike presenting at a Hackathon! Talk To The Hand dramatically increases the ability of deaf and hard of hearing people to speak to a broad audience.
How we built it
We have two components to the application, the first being the machine learning model that recognizes hand gestures and predicts the corresponding meaning and second being the web application that provides the user with an intuitive interface to perform the task of interpreting the signs and speaking them out with multiple language support for speech.
We built the model by training deep neural nets on a Kaggle dataset - Sign Language MNIST for hand gesture recognition (https://www.kaggle.com/datamunge/sign-language-mnist). Once we set up the inference mechanism to get the prediction from the model for the hand gesture given as an image, we used the prediction and converted it to speech in English using the Houndify text-to-speech API. We then set up the web application through which the user can interact using images of hand gestures and the interpretation of the gesture is both displayed as text and spoken out in their language of choice.
Challenges we ran into
Accomplishments that we're proud of
First and foremost, we are proud of having thought of and built the first iteration of an application that will allow for people dependent on sign language to cross any barriers to communication that may come their way. We are hopeful about the impact it will have on this community and are looking forward to carrying to the next phase. We are thrilled about developing a model that can predict the letter corresponding to the Sign Language and integrating it with Text-to-Speech API and deploying a functional web application even though our team is inexperienced with web development. Overall, we relish the experience for having pushed ourselves beyond what we thought was possible and working on something that we believe will change the world.
What we learned
One of the biggest takeaways for our team as a whole is going through the entire development life cycle fo a product starting from ideation to building the minimum viable product. We were exposed to more applications of Computer Vision through this project.
What's next for Talk to the Hand
Today’s version of Talk To The Hand is a very minimal representation created in 36 hours in order to show proof of concept. Next steps would include in-depth sign education and refined experience based on user testing and feedback. We believe Talk To The Hand could make a powerful impact in public speaking and presentation settings for the deaf and hard of hearing, especially in countries and communities where physical and financial access to interpreters proves difficult. Imagine a neighborhood activist sharing an impassioned speech before a protest, a middle school class president giving his inaugural address, or a young hacker pitching to her panel of judges.