Inspiration

500,000 ASL users. 10,000 interpreters. $300 per visit minimum. Deaf patients appointment scheduling is dependent on ASL interpreters availability, and when they arrive, they still can't speak for themselves. I wanted to remove the interpreter as a dependency and give deaf patients agency in their own care.

What it does

A patient signs in ASL in front of any camera. Grok vision reads the handshapes, generates a natural sentence in context, and xAI TTS speaks it aloud to the provider — in under 3 seconds. Staff respond verbally, xAI STT transcribes it, and the full encounter exports as a FHIR R4 record the patient owns. Each persona in the encounter — patient, receptionist, doctor — are empowered by by xAI.

How we built it

MediaPipe detects hands client-side. On sign completion, Grok vision receives frames plus natural language sign descriptions and returns both the recognized word and a contextual patient sentence in a single API call — no custom model, no training data. Node/Express backend, React + Vite frontend.

Challenges we ran into

Latency was the core challenge — ASL recognition, natural language generation, and TTS output in sequence was too slow. Solved by collapsing sign recognition and phrase generation into one Grok API call. Grok appending sign analysis to patient messages — fixed with stricter prompting. Multiple overlapping TTS voices caused by React StrictMode double-mounting the webcam component combined with premature state resets — fixed with a sequence guard.

Accomplishments that we're proud of

Sub-3-second end-to-end latency with no custom model. A one-click FHIR export that turns a conversation into a portable health record the patient owns.

What we learned

Grok's multimodal capabilities — vision, language, and voice — can replace an entire professional ASL interpreter pipeline. Prompt engineering with reference sign descriptions was surprisingly effective. FHIR is underused as a patient empowerment primitive — one export button changes the value proposition entirely.

What's next for Hands Up — Voice of the Voiceless

  • Direct EHR integration via FHIR
  • Support for BSL and other national sign languages
  • Tablet form factor for waiting rooms — no setup, no login, just hands
  • Expand vocabulary to cover full clinical intake
  • Expand to other scenarios: banks, libraries, airports

Be the voice of the voiceless.

Built With

Share this project:

Updates