Inspiration
Communication is something most of us don't even think about. We talk, our friends listen, and everyone understands each other. But for millions of deaf and hard-of-hearing people who use Sign Language, the digital world can feel completely locked off. Voice assistants like Siri or text-to-speech apps don't work for them at all.
We wanted to make something that could bridge this gap. Inspired by how expressive sign language is, we decided to build Gestura, a device that. can translate American Sign Language into speech
What it does
Gestura is a smart, real-time sign language translator. There are 3 modes in which the user can use the product, the device mode, the app mode, or the website mode. The app/website/device tracks your hands, figures out what word you are trying to say, and prints out the text on the screen while saying it out loud. The best part is that it can work on your laptop/phone/device completely offline. You don’t need any expensive tech like VR, specialized gloves, or special motion cameras.
How we built it
We used YoloV8 to train our model, OpenCV for Computer Vision, and ESpeak for TTS. Here is how we put the pieces together: The Trained YoloV8 attains data from the camera through OpenCV and gives out the detection the highest confidence rate label is converted to speach through Espeak and is spoken out.
- Connecting It All: We created a clean web interface using React so the user sees a smooth dashboard with their video and text side-by-side.
Challenges we ran into
We definitely hit a few major walls while coding this:
- The Lag Problem: At first, the app was super laggy—running at about 12 frames per second (FPS). The camera would freeze because it was waiting for the AI model to finish thinking before capturing the next frame. We fixed this by using multi-threading, which essentially splits the app into two brains: one thread just focuses on keeping the video smooth at 30 FPS, while the other processes the gestures in the background.
- The Overfitting Issue: Our AI got really good at recognizing signs when we did them, but when others did them, it was labeling them wrong sometimes, so we had to tweak our dataset.
Accomplishments that we're proud of
- Getting the whole pipeline to run smoothly at 30 FPS on a basic laptop camera without any lag.
- Figuring out the method so the app works perfectly whether you are sitting right up against the screen or standing a few feet back.
- Building a real, working project from scratch that actually solves a massive accessibility problem.
What we learned
This project taught us that coding isn't just about making logic scripts run; it's about building things that help real people. Technically, we learned a ton about how to process temporal data, how to structure neural networks, and how to manage background data pipelines in Python. It totally changed how we look at software design and UI/UX, showing us that accessibility should never be an afterthought.
What's next for Gestura
Right now, Gestura is really good at translating individual words. But people don't talk in single, isolated words—they speak in full sentences. Our next goal is to add Natural Language Processing (NLP) to the system so it can understand the unique grammar rules and full sentence structures of Sign Language, moving us closer to actual, fluent conversations.
Built With
- espeak
- opencv
- python
- raspberry-pi
- react
- streamlit
- vercel
- yolov8
Log in or sign up for Devpost to join the conversation.