Inspiration
Sign language is a complete language, but most hearing people never learn it, which makes everyday interactions harder for deaf and hard‑of‑hearing people. We wanted to build something simple that lets common signs like “hello”, “please”, and “thanks” be understood instantly, without needing a human interpreter. Recent examples of real-time sign recognition with MediaPipe showed us this was possible on a normal laptop, so we challenged ourselves to build an end‑to‑end prototype in a weekend.
What it does
SignBridge is a real-time sign assistant that uses a webcam to recognize a small set of common signs and display them as text on screen. It tracks the user’s upper body and both hands, converts landmarks into a feature vector, runs a classifier, and smooths predictions over time so the UI shows a steady “Detected: hello / please / thanks” instead of flickering guesses.
How we built it
We used MediaPipe Holistic to get pose and hand landmarks from the webcam, then designed features that anchor both hands relative to the shoulders so gestures become body‑aware instead of raw coordinates. A custom data collection script records these features into .npy files per label, and we train a scikit‑learn RandomForest on top for fast, CPU‑only inference. Finally, we integrated the trained model back into an OpenCV loop with a small prediction buffer (deque) and on‑screen overlays showing both the picture and current detected sign.
Challenges we ran into
Keeping the pipeline consistent was surprisingly hard: each change to the feature layout (adding pose, switching to Holistic) meant recollecting data and making sure training and inference both used the exact same vector length. We repeatedly hit NumPy and scikit‑learn errors when one part produced 147 features and another produced 126 or even 0, which forced us to add strict shape checks and padding. We also fought with mediapipe versions- newer builds dropped mp.solutions in favor of Tasks- so the same code behaved differently depending on the Python environment.
Accomplishments that we're proud of
We’re proud that SignBridge runs in real time on a standard laptop and reliably recognizes multiple signs using only CPU. Hitting 100% accuracy on held‑out test splits for “hello”, “please”, and “thanks” gave us confidence that our feature design and data collection are solid, not just overfitted. We’re also happy with the small “polish” pieces- fixed‑length feature enforcement, temporal smoothing, and a clear overlay UI which make the prototype feel like a tool rather than just a demo script.
What we learned
We learned how unforgiving ML pipelines are about consistency: if your training and live features differ by even one dimension, everything breaks. Working with MediaPipe Holistic showed us how much better body‑aware features perform compared to hand‑only ones, especially for gestures where arm position matters. We also got more disciplined about environment management and defensive coding: validating shapes, skipping empty datasets, and logging model expectations saved a lot of debugging time.
What's next for SignBridge
Next, we want to expand the vocabulary and move toward short phrases and sequence‑based recognition instead of single‑frame classification. We’d also like to wrap the engine in a friendlier interface- a browser or video‑call overlay with optional text‑to‑speech so hearing users can “hear” signed phrases. Longer term, we’re interested in supporting multiple sign languages and adding light personalization so the model adapts to individual signing styles.
Built With
- mediapipe
- numpy
- opencv
- pandas
- python
- scikit-learn
- tensorflow
Log in or sign up for Devpost to join the conversation.