Inspiration
Somewhere far from home, an elderly father stares at a pile of pill bottles he can't tell apart. A prescription he can't read. A lab report that makes no sense. He's alone and not because his family doesn't care, but because they're an ocean away.
This is personal. As an Indian student in the US, I live with a constant, quiet worry did Dad take his medication? Does Mom understand the new prescription? What happens if something goes wrong at 2 AM and I'm 8,000 miles away?
The numbers made it urgent: medication non-adherence causes over 125,000 deaths annually in the US alone. Elderly patients managing 5+ medications have a 50% non-adherence rate. And for non-English speakers, the problem compounds, they can't read prescriptions, can't communicate symptoms clearly, and can't navigate a healthcare system that wasn't built for them.
Existing solutions are text-based chatbots and reminder apps. They assume literacy, assume English, assume the user can type. My parents can barely navigate a settings menu. They needed something fundamentally different not an app they use, but a companion that uses them as the interface. Voice in, voice out. No forms. No buttons. Just a conversation in their own language.
That's why I built Heali.
What it does
Heali is a real-time, voice-first AI health companion that acts as a proactive guardian for elderly patients and a peace-of-mind dashboard for their families across the world.
Voice Guardian : The core experience. Users speak naturally to Heali through Gemini Live API's bidirectional audio streaming. They log medications ("I just took my Metformin"), report vitals ("My blood pressure is 128 over 82"), and describe symptoms ("I'm feeling dizzy") all through voice. Heali doesn't just log data; it cross-references symptoms against active prescriptions and flags potential side effects or dangerous interactions in real time.
Prescription & Lab Report Reader : Elderly patients hold a handwritten prescription or lab report to their camera. Heali uses Gemini 2.5 Flash's native vision to extract medication names, dosages, and lab values. Extracted data is cross-referenced for drug interaction checks before being added to the medication schedule.
Multilingual Support : Heali speaks Kannada, Hindi, Spanish, and English natively. The active language is passed through the WebSocket connection and injected into every agent's system prompt. Gemini Live handles transcription and synthesis in the selected language with no translation layer, no latency penalty.
Emergency Detection (Red Line Protocol) : A deterministic safety system that monitors the audio stream for high-risk keywords (chest pain, can't breathe, severe bleeding) in all four supported languages. When triggered, it bypasses all LLM reasoning and immediately fires emergency alerts to family members via Firebase Cloud Messaging and initiates a Twilio PSTN call to the registered emergency contact. This is not probabilistic AI judgment, it's a hardcoded safety net.
Exercise Coach : Guided 10-minute wellness sessions with real-time posture coaching via the camera. Breathing exercises, seated stretches, and gentle movement routines designed specifically for elderly users.
Medical Translator : Translates medical terminology, prescription instructions, and doctor's notes between languages. Elderly patients can understand what their doctor prescribed without relying on a family member to translate.
Family Dashboard : Caregivers see medication adherence scores (7-day rolling compliance), latest vitals with trend charts, a daily health digest, and instant emergency alerts. A son in New York can check on his father in Karnataka without calling at midnight.
Proactive Intelligence : Heali doesn't wait to be asked. It reminds users about medications, flags missed doses to family members, and alerts caregivers when adherence drops or vitals trend outside safe ranges.
How we built it
Heali runs on a cloud-native stack with six specialized AI agents orchestrated by Google ADK:
Frontend: React 18 with TypeScript, Vite, and Tailwind CSS. The UI is designed for elderly users, large touch targets, high contrast, minimal navigation. Audio is streamed bidirectionally over WebSockets.
Backend: FastAPI (Python 3.11) deployed on Google Cloud Run. The WebSocket endpoint /ws/{user_id} receives audio chunks and video frames, routes them through the ADK agent system, and returns audio responses plus UI event JSON. REST endpoints handle data operations for medications, food logs, family alerts, and scan uploads.
AI/LLM Layer: Google Gemini Live API (gemini-live-2.5-flash-native-audio) powers the real-time audio streaming, native audio in, native audio out, with tool-call execution mid-stream. Gemini 2.0 Flash handles prescription and lab report vision analysis.
Agent Architecture (Google ADK): A Root Coordinator agent delegates to six specialized sub-agents:
- Onboarding Agent: Voice-driven profile creation (name, medications, language, dietary needs)
- Guardian Agent: Medication tracking, vitals logging, emergency detection, pill verification, meal logging
- Exercise Agent: Guided wellness sessions with posture coaching
- Interpreter Agent : Prescription translation, lab report reading, drug interaction checks
- Insights Agent : Adherence analytics, vital trends, daily digest generation, family alerts
- Booking Agent : Symptom triage, nearby clinic search, appointment scheduling
Data Layer: Google Cloud Firestore stores user profiles, medication schedules, adherence logs, vitals, food logs, and family linkages. Firebase Authentication secures every WebSocket connection and REST call with ID token verification.
Emergency Pipeline: Emergency keyword detection triggers a Firebase Cloud Messaging push to all linked family members and a Twilio Voice API call to the registered emergency contact. The negation detection system ensures phrases like "I do NOT have chest pain" don't trigger false alerts.
Scheduling: Google Cloud Tasks handles medication reminder scheduling, firing FCM push notifications at the user's local dose times.
Deployment: Dockerized multi-stage build, deployed on Google Cloud Run with Cloud Build CI/CD pipeline (cloudbuild.yaml).
Challenges we ran into
WebSocket audio streaming reliability: Gemini Live API's bidirectional audio streaming over WebSockets was the hardest integration. Audio chunks arriving out of order, session timeouts during long conversations, and handling interruptions (user cutting off Heali mid-sentence) required careful buffer management and reconnection logic. Getting the latency under 500ms for a natural conversational feel took multiple iterations.
Multilingual emergency detection: Hardcoding emergency keywords in four languages sounds simple until you account for dialectal variations, partial sentences, and negation. "Chest pain" in Kannada has multiple colloquial forms. We built a negation detection layer to prevent false positives ("I do NOT have chest pain") while maintaining zero false negatives for genuine emergencies.
Vision extraction accuracy: Handwritten prescriptions from Indian doctors are notoriously difficult to read -- even for pharmacists. Getting Gemini's vision model to reliably extract medication names required a multi-pass approach: raw OCR extraction followed by cross-referencing against known drug databases to correct misreadings.
Designing for elderly users: Every UX assumption I had was wrong. Large buttons aren't enough. Voice-first means the entire interaction model changes -- there's no "back" button in a conversation. Error recovery had to be conversational ("I didn't catch that, could you say it again?") rather than visual. Testing with actual elderly users (my parents, via video call) revealed dozens of interaction patterns I hadn't considered.
Agent coordination complexity: Six agents sharing context through a single conversation required careful state management. The Guardian Agent needs to know what the Onboarding Agent captured. The Insights Agent needs access to what the Guardian logged. Google ADK's orchestration handled routing well, but designing the tool schemas so agents could read and write shared Firestore state without conflicts was a significant design challenge.
Accomplishments that we're proud of
It actually works, live, in production: Heali isn't a prototype or a mockup. It's deployed on Google Cloud Run, accessible at a public URL, processing real voice conversations through Gemini Live API in real time.
Sub-500ms voice response latency: A conversation with Heali feels natural. You speak, and Heali responds before the pause feels awkward. This was the single hardest technical achievement.
Zero false negatives on emergency detection: In our testing across all four languages, the Red Line Protocol has never missed a genuine emergency trigger. It has correctly handled negations, partial sentences, and dialectal variations without a single false negative.
True multimodal interaction: Voice, vision, and proactive alerts working together in a single session. A user can speak to Heali, show it a prescription through the camera, and receive a spoken response with drug interaction warnings, all without leaving the voice interface. Beyond the voice experience, dedicated pages for Food Log, Exercise, Prescriptions, and Reports give users and caregivers direct access to their health data whenever they need it with no voice session required. And because Heali is a responsive PWA deployed on Google Cloud Run, it works seamlessly on both mobile and desktop. A father uses it on his phone in Karnataka. His son checks the Family Dashboard on his laptop in New York. Same app, same data, any device, that's the power of building on Google Cloud.
Cross-Platform Access (Mobile + Desktop): Heali is a responsive Progressive Web App deployed on Google Cloud Run, working seamlessly across phones, tablets, and desktops. An elderly parent interacts through voice on their mobile phone. Their caregiver monitors the Family Dashboard on a laptop. Dedicated pages for Food Log, Exercise, Prescriptions, and Reports provide direct, non-voice access to health data for users who prefer to browse rather than speak. One app, any device, anywhere in the world.
My parents can actually use it: This is the accomplishment that matters most. I tested Heali with my father over video call. He spoke in Kannada. He held up a prescription to the camera. Heali understood him, read the prescription, and added the medication to his schedule. He didn't need help. He didn't need me to translate. That moment made the entire project worth it.
What we learned
Voice-first changes everything : Building for voice isn't "adding a microphone to a chat app." It's a fundamentally different interaction paradigm. There's no scroll, no back button, no visual hierarchy. The conversation is the interface. This forced us to rethink error handling, navigation, and feedback in ways that made the entire app better.
Gemini Live API is transformative: Native audio streaming with tool-call execution mid-conversation is a genuine paradigm shift. Previous voice AI required speech-to-text, LLM processing, then text-to-speech with three separate latency penalties. Gemini Live collapses this into a single bidirectional stream. The difference isn't incremental; it's the difference between "talking to a computer" and "talking to a companion."
Google ADK simplifies multi-agent orchestration: Coordinating six specialized agents could have been a nightmare of custom routing logic. ADK's declarative agent definitions and automatic tool routing meant we could focus on what each agent does rather than how they communicate.
Deterministic safety nets matter: We initially tried to let the LLM handle emergency detection. It worked 95% of the time. That's not good enough when someone says "I can't breathe." The Red Line Protocol, a simple, deterministic keyword matcher that bypasses all AI reasoning was the right call. Some decisions are too important for probabilistic systems.
Accessibility is a feature, not an afterthought: Multilingual support, voice-first interaction, large UI elements, proactive alerts, these aren't bonus features. For our target users, they are the product. Building for the most constrained user made Heali better for everyone.
What's next for Heali
Wearable integration: Connecting with smartwatches and blood pressure monitors for automatic vital logging, removing the need for manual voice reporting.
Expanded language support: Adding Chinese, Japanese, Tamil, Telugu, Bengali, and Arabic to serve more elderly populations worldwide.
Doctor-facing portal: Giving physicians read access to their patient's Heali data (with consent) so they can see medication adherence and vital trends between appointments.
Offline-first PWA: Enabling core medication reminders and emergency detection to work without an internet connection, critical for elderly users in rural areas with unreliable connectivity.
Clinical validation: Partnering with geriatric care facilities to run a pilot study measuring the impact of Heali on medication adherence rates and emergency response times.
Built With
- css
- docker
- fastapi
- firebase-authentication
- firebase-cloud-messaging
- gemini-2.0-flash
- gemini-2.5-flash
- gemini-live-api
- google-adk
- google-cloud-build
- google-cloud-firestore
- google-cloud-run
- google-cloud-tasks
- html
- imagen
- javascript
- python
- react
- shadcn/ui
- tailwind-css
- twilio-voice-api
- typescript
- veo
- vertexai
- vite
- websockets
Log in or sign up for Devpost to join the conversation.