Inspiration

Language barriers are one of the most persistent and high-impact challenges in clinical care, especially in underserved and volunteer-based healthcare settings.

As a medical assistant at a clinic, I frequently work with patients who are not fluent in English. In many cases, the current solution is to use human interpreters over the phone or video call to translate during patient visits. While effective, this approach introduces significant cost and logistical overhead, particularly in volunteer or low-resource environments where interpreter services must be scheduled and paid per session.

Over time, these costs accumulate and can become difficult to sustain, limiting access to consistent interpretation services when they are needed most.

This project aims to reduce that burden by providing a real-time, low-cost alternative for medical interpretation, enabling clinicians and patients to communicate directly without sacrificing clarity or safety.

What it does

Clinical Interpretation is a real-time bilingual voice system that enables seamless communication between English- and Spanish-speaking patients and providers.

Captures live speech from either party Transcribes audio using Deepgram streaming speech-to-text Detects language and translates speech using an LLM Converts translated text into natural speech using Deepgram TTS Delivers real-time bidirectional voice translation during the conversation

The result is a fluid, interpreter-like experience that supports natural clinical dialogue without a human intermediary.

How we built it

We built a streaming voice pipeline using:

Deepgram Nova / Flux (STT) for real-time multilingual transcription Claude / GPT-4o-mini for strict, deterministic medical translation Deepgram Aura TTS for natural speech output in English and Spanish Node.js for orchestrating real-time audio streams Speaker + microphone streaming libraries to handle live audio input/output

Challenges we ran into

Handling real-time audio streaming without buffer underruns or audio cutouts Managing backpressure between streaming STT and speaker output Ensuring the LLM behaves strictly as a translator (not a conversational agent) Designing reliable bidirectional switching between speaker roles Preventing latency and audio overlap in a continuous conversation loop

Accomplishments that we're proud of

Built a fully functional real-time medical voice interpreter Achieved stable bidirectional English ↔ Spanish communication Successfully integrated STT, LLM translation, and TTS into a single low-latency system Implemented buffering logic to eliminate audio cutoffs and improve stream reliability Created a system that closely mirrors real-world clinical interpreter workflows

What we learned

Real-time voice systems fail more often due to streaming and buffering issues than model quality In clinical communication, clarity and reliability matter more than conversational intelligence Role separation (doctor vs patient) is more important than language detection alone LLMs must be tightly constrained to behave as deterministic translators in healthcare settings Building voice AI requires thinking in terms of event-driven audio pipelines, not request-response APIs

What's next for Clinical Interpretation

Add structured clinical documentation generation (SOAP notes and visit summaries) Expand language support beyond English and Spanish Improve role detection (automatic doctor/patient inference) Add support for interruption handling (“barge-in” during speech) Integrate with electronic health record (EHR) systems for clinical documentation Optimize latency for near-instant conversational response in real clinical environments

Built With

Share this project:

Updates