SignBridge

Inspiration

1 in 5 ASL users report miscommunication with healthcare providers. In a hospital, that's not just frustrating; it can be life-threatening. A patient who can't communicate "I'm in pain" or "I'm allergic to this" faces real risk. Professional interpreters are expensive, unavailable at short notice, and simply absent from most clinical interactions.

We built SignBridge to change that. A patient signs. The doctor hears them speak. No interpreter needed.

What it does

SignBridge uses a standard webcam to recognise American Sign Language in real time and convert it into spoken English, with a specific focus on healthcare communication.

The pipeline:

MediaPipe detects 21 hand landmarks per frame via webcam in real time
Static signs (ASL alphabet A–Z) → MLP Neural Network classifier
Dynamic signs (words and phrases) → LSTM sequence model trained on 30-frame gesture windows
Recognised signs buffered into a gloss sequence → Groq LLaMA → natural medical sentence
Sentence spoken aloud via gTTS text-to-speech

Healthcare-specific gestures supported: hurt, help, doctor, emergency, water, eat, sorry, please, thank you, stop, sit, sleep, restroom, and more - exactly the signs most needed in a clinical setting.

A spell correction mode handles fingerspelling for names and medical terms - a patient can spell out a medication name letter by letter and SignBridge reconstructs the full word.

How we built it

We recorded our own custom dataset from scratch during the hackathon:

1,450 static landmark samples across 29 classes (full ASL alphabet + YES, NO, I LOVE YOU)
33,600 motion frames across 28 gesture classes

Instead of storing raw images, we used MediaPipe to extract 21 hand landmark (x, y, z) coordinates per frame - making the system signer-agnostic and independent of lighting, skin tone, and background.

Two separate models:

Model	Input	Architecture	Test Accuracy
Static classifier	63 values (single frame)	MLP: Dense(256)→Dense(128)→Dense(64)	100%
Motion classifier	126 values × 30 frames	LSTM(128)→LSTM(64)→Dense	95.54%

A movement detection layer routes to the correct model in real time - static model fires when the hand is still, LSTM fires when motion is detected. This prevents the two models from interfering with each other.

The LLM layer uses Groq (LLaMA) for ultra-low latency inference - critical for a real-time clinical tool. A healthcare-specific system prompt converts raw sign glosses into natural, contextually appropriate medical sentences.

Stack: Python · MediaPipe · TensorFlow · Keras · LSTM · Scikit-learn · Groq · LLaMA · Streamlit · OpenCV · gTTS

Challenges we ran into

Running static and motion models simultaneously caused constant interference - each model would override the other mid-signing. We solved this with a movement detection layer that measures frame-to-frame landmark displacement and exclusively routes to one model at a time.

We also hit Python 3.14 incompatibility with TensorFlow (required downgrade to 3.11), and had to navigate the MediaPipe 0.10.x API migration which removed the entire solutions namespace.

Accomplishments that we're proud of

Built and labelled our own dataset entirely within the 24-hour hackathon
First system combining static + dynamic ASL recognition in a single real-time pipeline with a routing layer
95.54% test accuracy on the motion model - trained on CPU in under 20 minutes
Healthcare-focused LLM prompt engineering producing clinically relevant output
Full pipeline: webcam → ASL → spoken English, running on a standard laptop with no GPU

What we learned

Data quality beats model complexity every time. MediaPipe landmark coordinates are so clean that a simple MLP outperforms many CNN approaches on static signs. We also learned that combining two models in a real-time pipeline is an architectural problem as much as a machine learning one - the routing logic matters more than the models themselves.

What's next for SignBridge

Expand to Indian Sign Language (ISL) - millions of signers, almost no existing tools
Add facial expression recognition for ASL grammatical cues
Deploy as a tablet app for clinical waiting rooms
Partner with hospitals to fine-tune the medical vocabulary
Support BSL (British Sign Language) and other regional variants

Dataset Note: The full training dataset (1,450 static samples + 33,600 motion frames) is not included in the GitHub repository due to file size limitations. Collection scripts (collect_static.py and collect_motion.py) are provided so anyone can record their own dataset and retrain the models.