CareBridge: Translating Discharge into Recovery 🏥

đź’ˇ Inspiration

The inspiration for CareBridge came from a staggering statistic: nearly 20% of Medicare patients are readmitted to the hospital within 30 days of discharge. A massive driver of this statistic is the "Health Literacy Gap."

Discharge summaries are the most critical documents a patient receives, yet they are often written in dense medical shorthand (e.g., "Take 1 tab PO QD"). For the millions of patients who speak English as a second language, or those with low health literacy, these papers aren't just confusing—they are dangerous.

We wanted to answer a simple question: Can we use Multimodal AI to turn a static, confusing piece of paper into an active, multilingual safety guardian?

đź”§ How We Built It

CareBridge is a "Client-Side AI" application built with React 19 and the Google GenAI SDK. We designed it to be a bridge between raw clinical data and patient understanding.

1. The Brain: Gemini 3 Pro & Thinking Config

We chose gemini-3-pro-preview as our reasoning engine. Standard LLMs often struggle with the chaotic layout of OCR'd medical documents.

  • We utilized the Thinking Config (thinkingBudget: 24000) to allow the model to "reason" through the document layout before extracting data.
  • We enforced Structured Outputs (JSON Schema) to guarantee that every medication, dosage, and timeline event was extracted in a machine-readable format for our UI.

2. The Safety Layer: Google Search Grounding

In healthcare, hallucinations are unacceptable. We implemented a "Med Guardian" feature using the Google Search Tool. $$ P(Safety) \propto \frac{Grounding}{Hallucination} $$ When the AI detects a medication, it doesn't just guess side effects. It performs a live search to cross-reference the specific drug interactions (e.g., Lisinopril vs. Potassium supplements) and returns the source URIs to the user.

3. The Voice: Gemini Live API

We didn't want a chatbot; we wanted a companion. We implemented the Gemini Live API (gemini-2.5-flash-native-audio-preview) over WebSockets.

  • This enabled <500ms latency, allowing for a natural "Teach-Back" session where the patient can interrupt the AI to ask questions like "Wait, does this pill make me dizzy?"
  • We process raw PCM audio buffers directly in the browser, creating a seamless audio-in/audio-out experience without the lag of traditional STT/TTS pipelines.

4. Accessibility: High-Fidelity TTS

For the "Plain Talk" summary, we used gemini-2.5-flash-preview-tts with the 'Fenrir' voice. We found this specific voice carried the necessary authority of a doctor while maintaining the warmth of a caregiver.

🏔️ Challenges We Ran Into

1. The "OCR" Noise: Medical documents are messy—handwritten notes, stamps, and poor scanning. Initially, the model missed dosage instructions hidden in the margins.

  • Solution: We moved from text-only prompts to Multimodal prompts, sending the raw image/PDF bytes directly to Gemini 3 Pro. The vision capabilities vastly outperformed standard OCR libraries.

2. Audio Latency & State Management: Handling the Gemini Live API raw audio streams in React was tricky. We had to manage AudioContext buffers carefully to prevent "clicking" sounds and drift.

  • Solution: We implemented a circular buffer system and synchronized the visual "waveform" animation with the incoming server messages to create a responsive UI.

3. Safety vs. Empathy: We struggled to balance the model's tone. If it was too clinical, patients tuned out. If it was too casual, it felt unsafe.

  • Solution: We used extensive System Instructions to define a persona: "An empathetic medical assistant speaking at an 8th-grade reading level."

🏅 Accomplishments That We're Proud Of

  • The "Pill Bottle Check": We built a feature where a patient can hold their physical pill bottle to the camera. Gemini compares the visual label against the digital discharge instructions to catch dispensing errors.
  • Real-time Multilingual Support: Seeing the UI and the AI's persona instantly switch from English to Hindi or Chinese feels like magic and truly democratizes access to care.

đź§  What We Learned

We learned that Gemini 3 Pro is not just a text generator; it is a reasoning engine. By offloading the complexity of medical extraction to the model's "thinking" process, we could build a frontend that is deceptively simple but incredibly powerful.

We also learned that latency matters. The difference between a 2-second delay and the Live API's sub-second response is the difference between "talking to a robot" and "talking to a helper."

🚀 What's Next for CareBridge

  • EHR Integration: Direct connection to Epic/Cerner for seamless data import.
  • Wearable Sync: Correlating "Red Flags" (like high heart rate) with Apple Watch data.
  • Family Mode: Allowing caregivers to receive SMS summaries of the patient's plan.

Built With

Share this project:

Updates