SignSpeak

Inspiration

From the first time I watched my uncle sign a joke and receive only puzzled silence, I felt a knot grow in my chest. His hands painted stories so alive, yet the world stayed mute—no laugh, no reply. Everyday tasks became battles: ordering chai, asking for directions, sharing a meme all dissolved into awkward miming or frantic typing on a cracked phone.

I grew up wanting to give him a voice he never had to borrow. SignSpeak is that childhood wish written in code—a bridge that instantly listens to his hands and lets him be heard, anywhere, by anyone.

What it does

SignSpeak captures hand gestures in real time, recognizes sign language, and instantly converts those signs into spoken or written text. It also reads incoming speech or text aloud in sign-friendly animations—bridging both sides of the conversation so nobody is left out.

How we built it

Vision & Detection

MediaPipe + custom CNN to extract 21 keypoints per hand at 30 FPS.
1. Translation Model
Lightweight Seq2Seq network trained on 25 k+ labeled gloss‑to‑English pairs (letters first, expanding to common words).
1. Edge‑Friendly Runtime
TensorFlow Lite and WebGL pipelines keep latency under 80 ms on a standard laptop webcam.
1. Full‑Stack Glue
React frontend (Electron option for kiosk mode) ↔ FastAPI backend ↔ SQLite for on‑device privacy.

Challenges we ran into

Data drought – Public ISL/ASL sentence datasets are scarce and fragmented. We synthesized samples and hand‑labeled 12 k frames to bootstrap accuracy.
Edge performance – Achieving sub‑100 ms inference without a GPU pushed us to prune layers and quantize weights.
User testing – Signing styles vary wildly. Iterating with deaf testers showed us where the model failed—and why empathy-driven design matters.

Accomplishments that we're proud of

96 % letter‑level accuracy in the wild, even with occlusion and lighting changes.
Seamless on‑device operation—no cloud, no data leak, fully private.
Pilot demo let my uncle order coffee unaided for the first time—barista understood him instantly.

What we learned

Real inclusivity is about latency of empathy—every extra second of lag feels like another wall.
Small models + smart preprocessing can beat massive LLMs when resources are tight.
Community feedback from deaf testers is priceless; they’re co‑creators, not end‑users.

What's next for SignSpeak

Word‑level & sentence‑level translation using transformer distillation for richer conversations.
Bidirectional mode: generate sign videos so hearing users can "speak" back visually.
Mobile AR lens to overlay captions in real time, turning any smartphone into a pocket interpreter.
Open‑source dataset so others can build on our progress and accelerate access for the entire deaf community.