How we used Gemini 3 (Gemini 3 Integration)
The advanced features of the Gemini 3 family were the key enabler that made this project possible efficiently
- Native Multimodal Audio Analysis: We utilize Gemini 3 Pro's native multimodal capabilities to ingest raw audio recordings of patient calls. Unlike legacy models that analyze text transcripts, Gemini 3 Pro "listens" to the sound files to detect audible signs of respiratory distress (dyspnea), vocal tremors, or cognitive fog—critical biomarkers that are invisible in text-only analysis.
- Long Context Window: Healthcare relies on history. We leverage Gemini 3's massive context window to process the patient's complete medical timeline in a single pass. Instead of relying on fragmented data retrieval (RAG), the agent ingests weeks of historical interaction logs, full discharge summaries, and complex clinical protocols simultaneously. This allows it to detect subtle deterioration trends over time that isolated automated check-ins would miss.
- Advanced Reasoning & MCP: We utilize Gemini 3's superior reasoning capabilities to act as a clinical decision engine. It autonomously uses the Model Context Protocol (MCP) to query our Firestore database, cross-reference symptoms with medical guidelines, and determine if a risk alert is truly warranted before disturbing a human nurse.
Inspiration
Hospital readmissions cost the U.S. healthcare system $26 billion annually (2025), up from $528M in HRRP penalties alone in 2017. The problem isn't shrinking. It's exploding", but the human cost is far greater. Nurses are overwhelmed and cannot follow up with every discharged patient effectively. We realized that existing solutions were either simple generic chatbots (which patients ignore) or active human calls (which are unscalable). We asked: Can we build an AI that doesn't just "talk", but actually "cares" and "listens" like a nurse?
What it does
CareFlow Pulse is an automated nurse coordinator system.
- It Calls Patients: A voice agent checks on patients daily via phone.
- It Listens Deeply: While the patient speaks, a clinical agent analyzes the audio stream to detect non-verbal biomarkers like breathlessness (dyspnea).
- It Coordinates Care: If a risk is detected, it instantly alerts the human nurse via a real-time Next.js dashboard.
How we built it
- Frontend: Next.js 16, React 19, shadcn/ui.
- Architecture: A Dual-Agent Architecture separating voice latency (Gemini 2.0 Flash) from clinical reasoning (Gemini 3 Pro) via A2A (Agent-to-Agent) protocol.
- Tools: Google ADK (Agent Development Kit), LangGraph, Twilio ConversationRelay, ElevenLabs.
- Security: Integrated Model Armor to sanitize PII/PHI in compliance with HIPAA standards.
Challenges we ran into
Building the A2A protocol was complex. Synchronizing the state between a fast-talking voice agent and a deep-thinking reasoning agent required a robust event-driven architecture. We also had to solve latency issues to ensure the handoff between agents didn't interrupt the natural flow of the conversation.
Accomplishments that we're proud of
We successfully implemented multimodal audio analysis in a production flow. Seeing the logs where Gemini 3 correctly flagged "labored breathing" from a raw audio file was a breakthrough moment. We're also proud of the production-grade Security implementation, proving AI can be safe for healthcare.
CareFlow Pulse's dual-agent architecture independently mirrors the design published by Google Research in their AMIE system for longitudinal disease management (March 2025) — but extends it from text-based research to production-grade voice-first multimodal patient monitoring.
What's next for CareFlow Pulse
- Patient-Centric Mobile App: A dedicated application for patients to access their medical history, ask context-aware questions about their recovery, and receive personalized guidance based on their specific clinical profile.
- Wearable Device Integration (IoT): Real-time monitoring through wearable devices to stream vitals directly to our agents. This enables instant signal detection and immediate nurse alerts the moment an anomaly occurs, without waiting for the next scheduled call.
- EHR Integration: Connecting directly to hospital systems via HL7/FHIR.
- Video Analysis: Utilizing Gemini 3 Vision for patients to send photos of surgical wounds for analysis.
Built With
- a2a
- adk
- antigravity
- cloud-run
- conversationrelay
- firebase
- gcp
- google-scheduler
- google-tasks
- langgraph
- mcp
- next.js
- python
- react.js
- scheduler
- twilio
Log in or sign up for Devpost to join the conversation.