Inspiration Over 70 million people worldwide use sign language as their primary means of communication — yet most hearing people can't understand a single letter of ASL. I wanted to bridge that gap instantly, with no app install, no account, no server-side processing. Just open a browser, show your hand, and see what you're signing.

The idea was simple: what if the 21 landmarks MediaPipe already detects could be turned into a real-time ASL alphabet classifier — entirely in the browser?

What I Learned The geometry of the human hand is surprisingly expressive. By comparing distances between landmarks rather than raw coordinates, the classifier becomes rotation-invariant — it works whether your hand is tilted, close to the camera, or at an angle.

I also learned that WASM runtimes fail loudly before JavaScript can catch them. MediaPipe's detectForVideo fires window.onerror on the very first call (before the WASM heap is warm) — bypassing every try/catch I wrapped it in. The fix: accept it as a cosmetic dev artefact and verify it doesn't reach production.

The trickiest UX lesson: humans don't hold gestures perfectly still. A naive "18 consecutive identical frames" rule felt broken. Switching to time-based commitment (800 ms hold + 8-frame gap tolerance) made detection feel natural and robust.

How I Built It Detection pipeline:

@mediapipe/tasks-vision HandLandmarker runs in VIDEO mode, producing 21 3D landmarks per frame at ~30 fps. A custom rule-based classifier (landmarkClassifier.ts) maps those landmarks to ASL letters using distance ratios: extended ( f

)

d ( wrist , tip f ) d ( wrist , mcp f )

1.5 extended(f)= d(wrist,mcp f ​ ) d(wrist,tip f ​ ) ​ 1.5 A time-based commit layer filters noise: a letter must be held stable for 800 ms, tolerating up to 8 empty frames (WASM gaps), with a 1100 ms cooldown between commits. Stack:

Next.js 14 (App Router, TypeScript, Tailwind CSS) — zero-backend, 100% client-side MediaPipe HandLandmarker — CPU delegate for maximum browser compatibility Canvas overlay with mirrored 21-point skeleton visualization PM2 + Nginx on Oracle Cloud for production deploy Letters supported: A B C D E F I K L O S U V W Y (J and Z require motion, not supported)

Challenges WASM warm-up errors that bypass JavaScript's error handling entirely — documented, accepted, moved on. Turbopack incompatibility with UMD-only MediaPipe packages forced a full revert from an attempted TensorFlow.js migration. Circular type inference in React's useCallback loop required restructuring the RAF loop into a useRef. False positives between similar letters (U/V, B/W) required careful threshold tuning and the confidence bar UI to give users visual feedback on detection certainty.

Built With

Share this project:

Updates