Inspiration

Communication should be effortless, but for millions of Deaf and Hard-of-Hearing individuals, everyday conversations can still be a barrier. We wanted to build something that makes communication more inclusive while still being fun, modern, and creative. With our retro-vs-modern hackathon theme, we imagined a world where classic nostalgia meets cutting-edge AI—so The HandStand was born: a translator that amplifies sign language into real-time speech.

What it does

The HandStand captures live sign language through a webcam, detects hand landmarks, classifies gestures, turns them into text, sends the text to an LLM for translation, and finally converts the output into expressive audio. In short: You sign, it understands, it speaks.

How we built it

We built a custom sign-language recognition model using Mediapipe keypoints and trained a classifier to interpret hand gestures. The front-end uses a live webcam feed to process frames in real-time. Once a gesture is recognized, we push the detected text through the Gemini API for language refinement/translation. Finally, we use ElevenLabs for high-quality, natural-sounding speech output. Everything is tied together with Python, OpenCV, and a simple UI for smooth translation.

Challenges we ran into

Managing large gesture datasets and training times. Getting real-time hand tracking to be accurate under different lighting and camera angles. Integrating multiple APIs (Gemini + ElevenLabs) without latency or permission issues. Keeping the pipeline fast enough to feel natural while translating. Debugging branch/merge issues while collaborating through GitHub.

Accomplishments that we're proud of

Building a fully functional end-to-end sign-language translator in under 24 hours. Successfully training a gesture classifier using only keypoints—lightweight and fast. Seamlessly connecting computer vision, LLM translation, and text-to-speech. Creating a fun, approachable retro-themed UI that actually works in real time. Making something that could genuinely improve accessibility and communication.

What we learned

How to train ML models using Mediapipe and hand-landmark datasets. Efficient ways to blend CV, LLMs, and audio generation. How powerful multimodal AI can be for accessibility. How to collaborate, merge branches, fix conflicts, and debug API permissions under time pressure. That designing for accessibility requires real empathy and thoughtful UX.

What's next for The HandStand

Adding support for full ASL sentences, not just individual gestures. Improving model accuracy with larger datasets like WLASL. Making a mobile version so it works anywhere. Adding audio-to-sign support for two-way communication. Expanding to more languages and customizable voices.

Built With

Share this project:

Updates