SignSpeak: One Sign at a Time

Inspiration

When I visited India last year, I met my mute cousin for the first time. He can only communicate via sign language; curious about his condition, I asked him why he was mute. He responded, 'I don't know' (in sign language of course). When I turned to his mother for answers instead, she broke down crying. The moment was deeply saddening and left me feeling helpless, as I couldn't do anything to change their situation.

9 months later, when I signed up for this hackathon, I realized it was the perfect opportunity for me to make at least a small difference in my cousin's life, which is why I decided to create an ASL-to-Speech converter. This project is my way of providing a solution that could significantly help not only him, but all sign-users around the world in their ability to communicate with others. In fact, I even asked my cousin to record himself using the model and send the video to me, which is linked above.

What it does

SignSpeak is a revolutionary application designed to bridge communication gaps between individuals who use sign language and those who may not be familiar with it. This innovative technology utilizes real-time gesture recognition to translate sign language gestures into spoken or written language and vice versa. By leveraging machine learning algorithms and computer vision, SignSpeak can accurately interpret a wide range of sign language gestures, including both standardized signs and individual variations.

Its core functionality is as follows:

Gesture Recognition: The application captures and analyzes sign language gestures using a device's camera in real-time. It identifies key movements and positions to interpret the intended meaning of each gesture.
Translation: Once a sign language gesture is recognized, SignSpeak translates it into spoken language through text-to-speech synthesis or displays the corresponding written text on the screen. This feature enables seamless communication between sign language users and individuals who do not understand sign language.

How we built it

Primarily using the following libraries and frameworks:

OpenCV (cv2): For capturing and processing webcam frames.
Numpy: For numerical operations and array manipulations.
Pyttsx3: Text-to-speech library for speech synthesis.
Keras with TensorFlow backend: For loading and using a pre-trained convolutional neural network (CNN) model (cnn8grps_rad1_model.h5) to recognize hand signs.
cvzone: Custom Computer Vision library (HandTrackingModule) for hand detection and tracking.
Enchant: For spell checking and word suggestions.
Tkinter: GUI library for creating the application window and widgets.
PIL (Pillow): For image processing and displaying images in Tkinter GUI.

Challenges we ran into

One of the foremost challenges was dealing with the variability in sign language gestures among different users. Sign language includes small individual nuances that required us to develop a robust model capable of recognizing diverse gestures accurately. For example, between two ASL users making the hand gesture for the same character 'A', one might have the thumb positioned slightly more adjacent to the index finger than the other. Therefore, using a rigid dataset to train our CNN was not possible, and instead we relied on custom collecting raw hand-data ourselves using cvzone and processing the binary gray/black images, to then plot the real-time vector model of the hand coordinates in the bound box and get the AtoZ_3.1 dataset.

Accomplishments that we're proud of

Even if our team doesn't secure a prize in this hackathon, we take pride in our achievements throughout this journey. Developing a real-time sign language to text conversion app was a testament to our creativity and technical prowess, that simultaneously demonstrated our ability to tackle complex challenges with innovative solutions. The seamless integration of the HandTrackingModule from cvzone for accurate hand detection, coupled with deploying a CNN model (cnn8grps_rad1_model.h5) for gesture recognition, further underscored our commitment to excellence in technology.

Throughout this process, our collaborative teamwork, problem-solving skills, and dedication to user feedback and iteration were pivotal beyond the competition, as the potential impact of our project on improving accessibility for ASL users reaffirms the significance of our efforts more than any award possibly could. This experience has been a journey of growth, learning, and a testament to our passion for leveraging technology to make a positive change; change that would not have been possible if it wasn't for the combined efforts of each of our team members, who worked long nights to pull this off.

What we learned

Technically, we deepened our understanding of machine learning algorithms and natural language processing techniques essential for real-time sign language recognition. Overcoming challenges such as gesture variability and environmental factors taught us the importance of robust model training and data preprocessing techniques.

Moreover, the project underscored the significance of user-centered design and usability testing in creating impactful solutions. Engaging with members of the deaf and hard-of-hearing communities provided firsthand perspectives that shaped our development roadmap. This interaction highlighted the critical need for intuitive interfaces and accurate translations to foster genuine inclusivity.

Beyond the technical realm, we learned the power of collaboration and interdisciplinary teamwork. Each team member brought unique skills and perspectives that enriched our problem-solving approach. Balancing technical innovation with empathy and social impact became a guiding principle in our development process.

But most importantly, I think we all became proficient in ASL ourselves!

What's next for SignSpeak: One Sign at a Time

Our immediate focus will be on refining the accuracy and robustness of the sign language to text conversion system. This involves fine-tuning the deep learning models to enhance recognition capabilities across various sign gestures and ensuring real-time performance remains optimal.

Moreover, we aim to broaden the accessibility of SignSpeak by exploring direct phrasal translation capabilities. This is because ASL doesn't just rely on step-by-step, individual character translation. Instead, there are specific symbols for people to convey entire words and even phrases (specifically 40,000 to 140,000 of them), like 'I love you' using a single hand gesture. This capability is currently missing in our model, due to computational and time constraints, and is something we would work towards implementing in the future, making the app all the more convenient and accessible.

In parallel, we are committed to engaging with the community of users and experts in sign language to gather feedback. This feedback will drive iterative improvements and new feature implementations, ensuring SignSpeak evolves to meet the specific needs and preferences of its users effectively.

Additionally, we envision collaborating with educational institutions and organizations focused on deaf and hard-of-hearing communities. By forging partnerships, we can integrate SignSpeak into educational curricula and workshops, thereby promoting greater awareness and proficiency in sign language communication.