Inspiration

We really wanted to do something utilizing computer vision to serve a purpose. We decided on sign language after looking at similar things like google translate conversation mode.

What it does

It detects American Sign Language (ASL) alphabet being used on your webcam, then displays the letters as text.

How we built it

We created a front-end using Next.js, and we built & connected the backend using python and Fast API. The hand detection is handled by MediaPipe, and the sign matching is handled by a custom algorithm.

Challenges we ran into

We were new to computer vision projects, and although there are readily available sources of ASL datasets, it was hard to find the data on entire words. Even with a dataset, it was hard to train the models for accuracy given the time constraints. We also tried many pre trained models which were not accurate enough or did not have a sufficient number of signs. Another challenge was that making the webapp caused a lot of errors and we had to spend a lot of time debugging.

Accomplishments that we're proud of

We are proud that we created a complete project given that some of our teammates are new to hackathons, and we also are proud of developing a pattern recognition solution from scratch

What we learned

This kind of project is new to all of us, we learned a lot about making API endpoints, ElevenLabs, MediaPipe, training ML models, labelling datasets, pattern recognition, computer vision, and more

What's next for HandSpeak AI

We want to add on to the project's functionality by utilizing MediaPipe's face landmarks to allow ElevenLabs to speak with emotion, and also use autocomplete algorithms to help the pattern recognition more accurately detect the sign language.

Built With

  • elevenlabs
  • fastapi
  • mediapipe
  • next
  • python
Share this project:

Updates