CareBridge: Translating Discharge into Recovery 🏥

💡 Inspiration

The inspiration for CareBridge came from a staggering statistic: nearly 20% of Medicare patients are readmitted to the hospital within 30 days of discharge. A massive driver of this statistic is the "Health Literacy Gap."

Discharge summaries are the most critical documents a patient receives, yet they are often written in dense medical shorthand (e.g., "Take 1 tab PO QD"). For the millions of patients who speak English as a second language, or those with low health literacy, these papers aren't just confusing—they are dangerous.

We wanted to answer a simple question: Can we use Multimodal AI to turn a static, confusing piece of paper into an active, multilingual safety guardian?

🔧 How We Built It

CareBridge is a "Client-Side AI" application built with React 19 and the Google GenAI SDK. We designed it to be a bridge between raw clinical data and patient understanding.

1. The Brain: Gemini 3 Pro & Thinking Config

We chose gemini-3-pro-preview as our reasoning engine. Standard LLMs often struggle with the chaotic layout of OCR'd medical documents.

We utilized the Thinking Config (thinkingBudget: 24000) to allow the model to "reason" through the document layout before extracting data.
We enforced Structured Outputs (JSON Schema) to guarantee that every medication, dosage, and timeline event was extracted in a machine-readable format for our UI.

2. The Safety Layer: Google Search Grounding

In healthcare, hallucinations are unacceptable. We implemented a "Med Guardian" feature using the Google Search Tool. $$ P(Safety) \propto \frac{Grounding}{Hallucination} $$ When the AI detects a medication, it doesn't just guess side effects. It performs a live search to cross-reference the specific drug interactions (e.g., Lisinopril vs. Potassium supplements) and returns the source URIs to the user.

3. The Voice: Gemini Live API

We didn't want a chatbot; we wanted a companion. We implemented the Gemini Live API (gemini-2.5-flash-native-audio-preview) over WebSockets.

This enabled <500ms latency, allowing for a natural "Teach-Back" session where the patient can interrupt the AI to ask questions like "Wait, does this pill make me dizzy?"
We process raw PCM audio buffers directly in the browser, creating a seamless audio-in/audio-out experience without the lag of traditional STT/TTS pipelines.

4. Accessibility: High-Fidelity TTS

For the "Plain Talk" summary, we used gemini-2.5-flash-preview-tts with the 'Fenrir' voice. We found this specific voice carried the necessary authority of a doctor while maintaining the warmth of a caregiver.

🏔️ Challenges We Ran Into

1. The "OCR" Noise: Medical documents are messy—handwritten notes, stamps, and poor scanning. Initially, the model missed dosage instructions hidden in the margins.

Solution: We moved from text-only prompts to Multimodal prompts, sending the raw image/PDF bytes directly to Gemini 3 Pro. The vision capabilities vastly outperformed standard OCR libraries.

2. Audio Latency & State Management: Handling the Gemini Live API raw audio streams in React was tricky. We had to manage AudioContext buffers carefully to prevent "clicking" sounds and drift.

Solution: We implemented a circular buffer system and synchronized the visual "waveform" animation with the incoming server messages to create a responsive UI.

3. Safety vs. Empathy: We struggled to balance the model's tone. If it was too clinical, patients tuned out. If it was too casual, it felt unsafe.

Solution: We used extensive System Instructions to define a persona: "An empathetic medical assistant speaking at an 8th-grade reading level."

🏅 Accomplishments That We're Proud Of

The "Pill Bottle Check": We built a feature where a patient can hold their physical pill bottle to the camera. Gemini compares the visual label against the digital discharge instructions to catch dispensing errors.
Real-time Multilingual Support: Seeing the UI and the AI's persona instantly switch from English to Hindi or Chinese feels like magic and truly democratizes access to care.

🧠 What We Learned

We learned that Gemini 3 Pro is not just a text generator; it is a reasoning engine. By offloading the complexity of medical extraction to the model's "thinking" process, we could build a frontend that is deceptively simple but incredibly powerful.

We also learned that latency matters. The difference between a 2-second delay and the Live API's sub-second response is the difference between "talking to a robot" and "talking to a helper."

🚀 What's Next for CareBridge

EHR Integration: Direct connection to Epic/Cerner for seamless data import.
Wearable Sync: Correlating "Red Flags" (like high heart rate) with Apple Watch data.
Family Mode: Allowing caregivers to receive SMS summaries of the patient's plan.

Built With

aistudio
gemini
gemini3pro

Updates

deepti bahel started this project — Feb 09, 2026 01:28 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.