About the project
Inspiration
Sport is one of the most shared experiences we have, but for Deaf and hard-of-hearing fans, a live match is mostly silence. Closed captions lag, often aren't there at all for live commentary, and even when present they flatten the play-by-play into text that can't keep up with the pace or carry the feeling of the moment. We wanted to build the thing that's actually missing: a live sign-language layer that sits on top of the broadcast everyone is already watching, so nobody has to wait for a special accessible feed that may never come.
What it does
SignCast embeds any YouTube live stream and overlays a real-time American Sign Language interpreter on top of it. It captures the broadcast's audio, transcribes the commentary, rewrites it into ASL grammar, and plays real human-signer video clips in a small, draggable, resizable overlay you can park in any corner. Underneath, a separate validation agent measures translation quality across five sign languages, so the system can be held to a standard instead of trusted on faith.
How we built it
The live path is a streaming pipeline with two WebSocket hops:
- Capture: the browser grabs the tab's audio via
getDisplayMedia, downsamples it to 16 kHz mono PCM, and streams it to the backend. - Transcribe: Deepgram (Nova-3) turns that into text; we buffer the fragments into complete utterances, flushing on a natural pause or after a few seconds, because continuous commentary over crowd noise rarely produces a clean pause on its own.
- Translate: Claude rewrites each utterance into ASL-ordered gloss steps: topic-first, articles and copulas dropped, common base-form words for signs, and fingerspelling for proper nouns and anything uncertain.
- Render: each sign word is matched live against the WLASL dataset of human-signer clips; unmatched words fall back to fingerspelling. The backend streams one clip event per word over WebSocket, and the overlay plays them with captions, speeding each clip up slightly so the signing tracks the commentator's actual pace.
Alongside this, we built a validation agent: Claude generates realistic match commentary across eight scenarios, a translator converts it into ASL using per-language grammar rules, and Claude-as-judge scores every output on five metrics, producing a markdown report that flags its own weakest cases.
What we learned
- Sign language is not English with hands. The hard part isn't playing clips, it's reordering into a different grammar and knowing when not to translate (fingerspell the name instead of inventing a sign).
- Generated signing was the wrong instinct. Synthetic avatars driven straight from text produce signing that's unreadable, even offensive, to fluent signers. Pre-recorded human clips with an honest fingerspelling fallback are slower to scale but actually usable and more respectful.
- Streaming STT fights you on live audio. The endpointing that works for clean speech stalls on nonstop commentary, which is why the utterance buffering has a time-based escape hatch.
- You can't claim accuracy; you have to measure it. Building the evaluator changed how we talked about the product.
Challenges we faced
- The no-audio race. Deepgram closes a connection that gets no audio within ~10 seconds, less time than it takes a human to click "start capture" and clear the browser's tab-share permission dialog. We had to wait for the first real audio chunk before ever opening the Deepgram connection.
- Keeping signing in sync with live speech. WLASL clips are recorded at a slow, deliberate teaching pace; played at 1× they fall further and further behind. We pass each utterance's spoken duration through to the frontend and speed up playback (clamped) so it keeps up without looking unnatural.
- Vocabulary gaps. No clip set covers everything, so unmatched signs are demoted to fingerspelling rather than dropped or faked.
- Honest scope under a deadline. Five languages translate and grade well in evaluation, but only ASL has backing clips in the live overlay; we kept the demo honest about that line.
Log in or sign up for Devpost to join the conversation.