Inspiration
We really wanted to do something utilizing computer vision to serve a purpose. We decided on sign language after looking at similar things like google translate conversation mode.
What it does
It detects American Sign Language (ASL) alphabet being used on your webcam, then displays the letters as text.
How we built it
We created a front-end using Next.js, and we built & connected the backend using python and Fast API. The hand detection is handled by MediaPipe, and the sign matching is handled by a custom algorithm.
Challenges we ran into
We were new to computer vision projects, and although there are readily available sources of ASL datasets, it was hard to find the data on entire words. Even with a dataset, it was hard to train the models for accuracy given the time constraints. We also tried many pre trained models which were not accurate enough or did not have a sufficient number of signs. Another challenge was that making the webapp caused a lot of errors and we had to spend a lot of time debugging.
Accomplishments that we're proud of
We are proud that we created a complete project given that some of our teammates are new to hackathons, and we also are proud of developing a pattern recognition solution from scratch
What we learned
This kind of project is new to all of us, we learned a lot about making API endpoints, ElevenLabs, MediaPipe, training ML models, labelling datasets, pattern recognition, computer vision, and more
What's next for HandSpeak AI
We want to add on to the project's functionality by utilizing MediaPipe's face landmarks to allow ElevenLabs to speak with emotion, and also use autocomplete algorithms to help the pattern recognition more accurately detect the sign language.
Built With
- elevenlabs
- fastapi
- mediapipe
- next
- python


Log in or sign up for Devpost to join the conversation.