Inspiration
Over 70 million people worldwide use sign language as their primary language, yet most hearing people know zero signs. Every day, deaf and hard-of-hearing individuals face communication barriers in hospitals, classrooms, workplaces, and everyday life. We wanted to build something that actually works in the real world, not a demo toy, but a tool someone could use in a medical emergency, at a coffee shop, or in a classroom. That question became SignBridge.
What it does
SignBridge is a real-time ASL communication platform. A deaf person signs letters in front of a webcam, our custom-trained ML model recognizes each letter instantly and builds words with AI-powered autocomplete. When a sentence is confirmed, ElevenLabs speaks it aloud in a human emotional voice that matches the mood of what was signed. Urgent sentences sound urgent, grateful sentences sound warm. A push-to-talk feature lets the hearing person speak back, with their words transcribed on screen for the deaf user to read. The entire conversation runs in 5 languages, English, Spanish, French, Italian, and Greek. A live phone web view lets the hearing person follow the conversation on their own device. And a triple-tap Emergency mode triggers an immediate spoken alert for life-threatening situations.
How we built it
We built SignBridge entirely in Python.
MediaPipe extracts 21 3D hand landmarks per frame from the webcam,
which we normalize relative to the wrist to make predictions
position and scale invariant. Those 63 floats are fed into a
RandomForest classifier we trained on 61,237 samples, combining
the Kaggle ASL alphabet dataset with custom webcam data we collected
ourselves during the hackathon. The model achieves 96.16% accuracy
across 26 letters plus the ILY gesture. Google Gemini handles
real-time sentence translation across 5 languages with an offline
dictionary fallback. ElevenLabs delivers emotionally expressive
speech. We map 7 detected moods to different voices and stability
settings so the voice actually sounds human. Google Speech
Recognition handles push-to-talk transcription. A token-secured
Flask server with QR code access serves a live mobile web view.
The entire UI is rendered in OpenCV with a custom dark theme
designed in Figma.
Challenges we ran into
The letters M, N, U, and R look nearly identical through landmarks alone, they differ mainly in finger positioning that's hard to capture at 63 floats. We solved this by collecting 100+ custom webcam samples per letter during the hackathon, improving their F1 scores by 3-4 points each. J and Z are motion-based signs that can't be captured in a single frame, we attempted trajectory tracking but hit too many false positives under time pressure, so we made a pragmatic decision to map them to keyboard shortcuts for the demo. Getting ElevenLabs audio playback working on macOS without ffmpeg required a custom PCM-to-WAV pipeline. Gemini API quota exhausted during debugging, which pushed us to build a complete offline fallback system that made the app more resilient.
Accomplishments that we're proud of
- 96.16% classification accuracy on a custom trained model built and improved entirely during the hackathon
- Genuine two-way conversation between deaf and hearing people — not just one-way translation
- ElevenLabs mood-aware voices that actually sound emotionally different, 7 moods mapped to different voices and stability settings
- Full offline fallback for every API, the app never crashes and works without WiFi
- Emergency mode that could genuinely save a life
- 5 language output including Greek with proper script pronunciation
- ILY gesture auto-confirming as a full phrase, the demo moment that lands every time
What we learned
Data quality beats model complexity every time, collecting 100 custom samples per weak letter improved accuracy more than any hyperparameter tuning. Pragmatic decisions matter at a hackathon, knowing when to cut a feature and ship a working alternative is a real engineering skill. Building for accessibility means thinking about the actual person using it, not just the technical problem every feature we added asked one question: does this make that person's life easier?
What's next for SignBridge
A mobile app so the translator runs on a phone camera, truly portable for real-world situations like doctor visits or emergencies. Full ASL word recognition beyond just the alphabet, capturing the real language not just fingerspelling. Emergency mode integrated with actual 911 dispatch systems. Multi-user conversation mode for group settings like classrooms.
Built With
- elevenlabs
- flask
- google-gemini
- google-speech-recognition
- mediapipe
- numpy
- opencv
- python
- scikit-learn
Log in or sign up for Devpost to join the conversation.