Inspiration

Deaf and hard-of-hearing patients face a critical, often invisible barrier in healthcare: when they walk into a clinic without a human interpreter present, they have no fast way to communicate symptoms to a doctor or caseworker. Professional interpreters are expensive, frequently unavailable on short notice, and simply don't exist in many community clinics. We wanted to build something that didn't just translate words, but actually understood medical context — a deaf patient signing "headache" and "dizzy" should lead to something genuinely useful for the doctor, not just a literal word dump.

What It Does

SignBridge lets a patient sign their symptoms directly to a camera using ASL. Our system recognizes the signs, sends the recognized symptoms through a clinical diagnosis engine, and generates a structured, doctor-ready summary — including possible conditions, an urgency level, and suggested questions — all in under a few seconds, with no human interpreter required.

How We Built It

Input → AI Process → Output pipeline:

  1. Input: A patient performs ASL signs in front of a webcam (or uploads a pre-recorded video).
  2. Computer Vision: Google MediaPipe extracts 21 hand landmarks per frame; a custom-trained LSTM neural network classifies the sign from a 30-frame motion sequence. We trained this model on real ASL signing data from the ASL Citizen dataset (Microsoft Research) — ~30 videos per sign across 12 medical signs (headache, stomach pain, dizziness, cough, fever, pain, and more).
  3. Clinical AI: Recognized signs are sent to the Infermedica API, a clinical-grade diagnosis and triage engine used by real healthcare companies. It returns probability-ranked conditions and an evidence-based urgency level (e.g. "see a doctor within 24 hours").
  4. Natural Language Generation: Groq (Llama 3.3) converts the clinical output into a concise, plain-language summary for the doctor — always framed as "may indicate," never a definitive diagnosis.
  5. Output: The doctor/caseworker sees a structured staff screen: patient statement, possible conditions with probabilities, urgency level, and suggested follow-up questions.

Why AI, specifically: A static phrasebook or directory could show a single word's translation, but it can't combine multiple symptoms into a coherent clinical picture, can't weigh urgency, and can't adapt its questions based on context. We needed (1) sequence-based computer vision to recognize movement, not just static hand shape, and (2) a true clinical reasoning engine to turn raw symptoms into something a doctor can act on.

Human-in-the-loop: SignBridge never makes a diagnosis. The AI only bridges communication — every output is explicitly framed as "may indicate," and the doctor must click "Confirm Interpretation" before any action is taken. Any case with low confidence or a HIGH/CRITICAL urgency level is flagged for mandatory human review rather than silently proceeding.

Challenges We Ran Into

  • Data leakage: Our first model scored a misleadingly high 91% accuracy — until we discovered our train/test split was happening after data augmentation, meaning augmented copies of the same video appeared in both sets. Fixing this revealed our honest accuracy was closer to 40% with our original tiny dataset (88 videos).
  • Dataset quality: WLASL, the most commonly used ASL dataset, had roughly 52% dead video links. We migrated to ASL Citizen (Microsoft Research), a purpose-recorded dataset with far better availability, which let us go from ~7 videos/sign to ~30 videos/sign and meaningfully improved real validated accuracy.
  • WebSocket stability under ML inference load: Running TensorFlow and MediaPipe inference inside a live camera loop caused real instability with Flask-SocketIO's async modes. We learned firsthand why heavy ML inference and cooperative async frameworks (like eventlet) don't mix well, and resolved it using proper OS-level threading instead.
  • Symptom-to-clinical mapping: Mapping our 12 ASL signs to real clinical symptom codes required actually exploring Infermedica's 1,700+ symptom database rather than guessing — our first attempt used vague mappings that produced clinically nonsensical results (e.g. "emergency" triggering a "monitor and rest" recommendation).

Accomplishments That We're Proud Of

  • We built and validated a working 3-AI hybrid pipeline — computer vision, a real clinical diagnosis API, and an LLM — that hands off cleanly between each stage.
  • We caught and fixed a genuine data leakage bug that would have let us ship a model we believed was 91% accurate when it was actually around 40% — and we instead reported the honest, validated number (~83% on real unseen signers).
  • We tested our final model against real ASL Citizen test videos performed by people we never trained on, not just our own signing — this is a meaningfully more honest validation than most hackathon prototypes attempt.
  • We built a responsible-AI safety net that's based on triage logic, not a single fragile gesture — meaning even if one sign isn't recognized, a dangerous symptom combination can still trigger a high-urgency flag through the clinical engine.

What We Learned

We learned that honest validation matters more than impressive-looking numbers — our first "91% accuracy" model was actually broken, and only rigorous train/test separation revealed the truth. We also learned that combining multiple specialized AI systems produces something genuinely more useful than any single model could alone, but it requires careful design about where each system's responsibility starts and ends, and where a human must remain in the loop. On the engineering side, we learned that async frameworks designed for I/O-bound web apps (like eventlet) can silently conflict with CPU-bound ML inference libraries — a lesson that cost us real debugging time but taught us to match our concurrency model to our actual workload.

What's Next for SignBridge

  • Expand sign vocabulary beyond our current 12 medical signs, using continuous sign language recognition rather than isolated-sign detection, so patients can sign full sentences rather than single words.
  • Add facial expression and body-position tracking (via MediaPipe Holistic) since real ASL grammar uses facial markers and body-relative pointing — particularly relevant for signs that indicate a specific body location.
  • Validate with the Deaf community directly — our current training data is purpose-recorded but dictionary-style; real-world deployment needs direct involvement from Deaf signers in both data collection and usability testing.
  • Move from Kaggle-based symptom severity scoring to full reliance on validated clinical triage protocols (we've already started this transition with Infermedica) for every part of our urgency assessment, not just diagnosis.

Built With

Share this project:

Updates