Inspiration
ASLingua was inspired by a personal story within our team. One of our members has a family member who is mute, and because he never learned sign language, there has always been a communication barrier between them. That frequent frustration highlighted how inaccessible sign language still is for many people. We wanted to build a friendly and intuitive solution that helps reduce those barriers and makes communication more natural and inclusive.
What it does
ASLingua detects basic American Sign Language gestures and translates them into spoken words in real time using a camera. In addition, users can type a word and instantly learn its equivalent in ASL through visual demonstrations. This two way interaction allows both signers and non signers to communicate and learn from each other seamlessly.
How we built it
We built ASLingua in two main workflows. For sign to voice translation, we captured hand landmark vectors from a camera and recorded them into a labeled CSV dataset. This dataset was used to train a machine learning model based on the k nearest neighbors algorithm, which predicts the word corresponding to a detected sign. The predicted word is then converted into speech using ElevenLabs.
For the word to sign workflow, we stored words in a MongoDB database and mapped them to prerecorded ASL videos. When a user types a word, the application retrieves and displays the corresponding video, allowing users to visually learn the sign.
Challenges we ran into
Our biggest challenge was connecting the backend and frontend together in real time. While our sign recognition pipeline worked well on the backend, integrating it with the frontend camera feed proved difficult. Ensuring smooth communication between the camera, the machine learning model, and the user interface required a lot of debugging and iteration under tight time constraints.
Accomplishments that we're proud of
We are especially proud of our resilience as a team. Despite technical roadblocks and very little sleep, we supported each other and stayed focused on our goal. In the end, we delivered a complete end to end product that goes beyond a simple demo and directly addresses a real world accessibility problem.
What we learned
Through this project, we gained hands on experience with hand landmark detection, learning that each hand is represented by 21 points defined in three dimensional space. We also deepened our understanding of machine learning workflows, including dataset collection, labeling, and model selection. In addition, we learned about non relational databases, MediaPipe tasks, Dockerization, and how to integrate complex systems into a single application.
What's next for ASLingua
Next, we aim to expand ASLingua’s sign recognition dictionary to include a much larger set of gestures. We also plan to implement sentence level recognition so full ASL expressions can be translated more naturally. In the reverse direction, we want users to be able to write complete sentences and receive dynamically generated sign language videos, with the long term goal of supporting multiple sign languages worldwide.
Built With
- css
- csv
- docker
- elevenlabs
- googlegemini
- html5
- javascript
- mongodbatlas
- python
Log in or sign up for Devpost to join the conversation.