MediScribe

Our Story

As first-generation and international college students — Hung, from Vietnam, and Sabal — we've both struggled to navigate the American healthcare system. But we speak English. We can Google our symptoms, read a bill, push back on a charge.

Our parents can't.

For Hung's family, a hospital visit means confusion, unexpected bills, and no way to ask why. For Sabal's, it means nodding along to a doctor they don't fully understand — and hoping nothing gets lost.

That fear is what drives us. We struggled with this system with English. Without it, the system is nearly impossible.

The problem isn't just language. It's understanding.

When patients don't understand their doctor, they stop asking questions. When they don't understand their bill, they stop responding. And when communication breaks down, trust disappears.

That's why we built MediScribe — to turn confusing, stressful hospital visits into clear, human conversations where patients feel informed, confident, and in control.


Our Inspiration

Over 25 million Americans can't communicate with their doctor in English.

Professional medical interpreters cost up to $150/hour — and are not always available when it matters most.

MediScribe uses live translation and comprehension to solve this challenge, for just ~$0.01 per conversation turn.


What It Does

MediScribe is a real-time, bidirectional medical interpreter with zero buttons.

Patient → Doctor

  • Patient speaks their native language
  • → Translated into clinically accurate English
  • → Includes symptom extraction + urgency flags

Doctor → Patient

  • Doctor speaks normally (with jargon)
  • → Simplified to 5th-grade reading level
  • → Translated and spoken back to the patient

No UI friction. No confusion. Just a smooth conversation.


Performance

End-to-end latency in just under 5 seconds

$$ T_{response} = T_{vad} + T_{stt} + T_{ai} + T_{tts} $$

$$ T_{response} = 0.8s + 1.2s + 1.5s + 1.0s = 4.5s $$

Feels near real-time in clinical settings.


Cost Advantage

  • 60 minutes with a professional translator: $45–$150
  • 60 minutes with MediScribe: $0.48

$$ \text{Savings} = 1 - \frac{\$0.48}{\$150} = 99.7\% $$


Key Innovation: Voice Activity Detection (VAD)

Without VAD

  • ~4,500 API calls per session

With VAD

  • ~20 API calls

$$ \text{Reduction} = 1 - \frac{20}{4500} = 99.6\% $$

The real breakthrough wasn't the AI — it was tuning this:

positiveSpeechThreshold: 0.60
negativeSpeechThreshold: 0.35
silenceTimeoutMs:        800   # most important
minSpeechMs:             300
preSpeechPadMs:          230

That 800ms silence timeout is what makes conversations feel natural instead of robotic.


How We Built It

Two-mode real-time pipeline over WebSockets

Doctor → Patient

Audio → STT → Simplify → Translate → TTS → Playback

Patient → Doctor

Audio → STT → Translate + Grammar Recovery → Text Output

Tech Stack

  • Gemini 2.5 Flash (translation + simplification)
  • ElevenLabs (STT + TTS)
  • Silero VAD (ONNX, client-side)
  • Django Channels (WebSockets)
  • React + Tailwind (frontend)

Challenges

1. Broken Translations

Hindi → English outputs like:

"my head do big hurt"

Fix: Grammar recovery pass → "The patient reports a severe headache."

2. Sentence Chopping (VAD Issue)

The default silence timeout (250ms) cuts patients off mid-thought.

Fix: Increased to 800ms, enabling natural pauses.


Example Use

Doctor says:

"You have acute pleuritic chest pain consistent with pericarditis…"

MediScribe outputs:

"You have a sharp pain in your chest when you breathe. The lining around your heart is swollen."


Readability Improvement

  • Before: Grade 14
  • After: Grade 5

→ 9-grade-level reduction


What We Learned

  • System tuning > model choice
  • Constraints > prompting

Example Guardrails:

  • "Max 3 sentences"
  • "No hallucinated advice"
  • "Output [CLARIFICATION NEEDED] if unsure"

These guardrails made outputs clinically safer and more consistent.


What's Next for MediScribe

  • ICD-10 validation (RAG pipeline)
  • Auto-generated clinical notes
  • Mobile deployment (VAD already mobile-compatible)
  • HIPAA compliance (encryption, audit logs, on-prem models)

Built With

Share this project:

Updates