MyHealthAI: The Multimodal Life Loop

Dashboard
Voice or chat consultation
Visual consultation
History
Improvements
Emergency

💡 Inspiration

The healthcare industry is often divided between cold, precise IoT data and warm, empathetic human consultation. We asked ourselves: Can an AI bridge this gap? We were inspired to create MyHealthAI—a personal "Mini-Hospital" that doesn't just read your numbers but understands your context. We wanted to break the "text box" paradigm and create a Multimodal Life Loop where an agent can See your symptoms (Vision), Hear your concerns (Voice), and Speak with clinical authority (TTS).

🩺 What it does

MyHealthAI is a comprehensive medical suite that provides:

Live Vital Streaming: Continuous monitoring of Heart Rate, SpO2, and BP via WebSockets.
Dr. Aura (Multimodal Agent): A context-aware AI doctor you can speak to naturally.
Clinical Grounding: Every response is cross-referenced with a custom dataset of AHA/ADA/GINA medical guidelines to prevent hallucinations.
Visual Consult: An agentic vision system that analyzes skin conditions or anatomy diagrams in real-time.
Deterministic Risk Scoring: A mathematical engine that assigns "Stable", "Monitor", or "Critical" status without relying on LLM guesswork.

🛠️ How we built it

Intelligence: Powered by Gemini 2.0 Flash for its world-class multimodal latency.
Frontend: Next.js 14 with a "Swiss-Future" glassmorphic UI, Framer Motion for premium animations, and Tailwind CSS v4.
Backend: Node.js/Express orchestrating a complex WebSocket gateway.
Infrastructure: Optimized with Docker, Prisma, and a custom Gemini Key Rotation Manager to guarantee 100% agent availability.
Safety: A layered architecture including Zod validation and a secondary clinical rules engine.

🚧 Challenges we ran into

Synchronizing real-time vital telemetry with asynchronous AI vision was a major engineering hurdle. We had to build a Stateful Agent Session that could "buffer" medical context while the Gemini model processed image frames. Additionally, ensuring the AI remained empathetic but strictly professional required rigorous prompt engineering and the implementation of a dedicated Clinical Grounding Layer.

🏆 Accomplishments that we're proud of

We are incredibly proud of our sub-2-second "See, Hear, Speak" loop. Seeing the agent recognize a visual symptom and then vocally provide a grounded medical insight within seconds feels like science fiction. We also successfully implemented a Legal & Privacy Agreement Flow, making the app ready for real-world professional data standards.

🧠 What we learned

Multimodal is the Future: Gemini's ability to handle vision and voice simultaneously is transformative for healthcare. It makes AI feel like a companion rather than a search engine.
Grounding > Generativity: In medical contexts, a model's ability to stick to provided guidelines (Grounding) is far more important than its creative reasoning.
Architecture Matters: A modular, agentic backend is the only way to scale real-time AI interactions without losing performance.

🔮 What's next for MyHealthAI: The Multimodal Life Loop

Our vision is to integrate MyHealthAI directly with wearable hardware (smart rings/watches) to move from reactive consultation to proactive life-saving alerts. We also plan to expand our clinical grounding to include specialized pediatric and geriatric datasets, making Dr. Aura a truly universal family medical companion.

-Description:

1. The Problem 🚩

Current healthcare technology is siloed. Fitness trackers provide raw numbers (heart rate, SpO2) without context, while AI chatbots offer text-based advice that often lacks clinical grounding or feels robotic. Patients facing symptoms like skin rashes or respiratory distress often have to wait hours for a consultation or risk "WebMD-ing" their way into a panic. There is a massive gap between precise biometric data and empathetic, real-time medical guidance.

2. Our Approach 🧪

We built MyHealthAI—a "Multimodal Life Loop" designed to bridge the gap between IoT data and clinical expertise. Our approach centers on three core pillars:

Multimodal Intelligence: Using Gemini 2.0 Flash, our agent (Dr. Aura) doesn't just read your vitals; she can see symptoms via camera analysis, hear your cough or vocal concerns, and speak with human-like empathy.
Deterministic Safety: We decoupled medical risk assessment from the LLM. While Gemini handles the conversation, a custom Clinical Rules Engine mathematically evaluates telemetry (HR, SpO2, BP) against standardized medical guidelines (AHA, ADA, GINA) to ensure 100% accuracy in risk categorization.
Real-Time Synergy: By leveraging WebSockets for live telemetry streaming and the native multimodal capabilities of Gemini, we achieved a response latency of under 2 seconds, making the AI feel like a live, present doctor rather than a slow search engine.

3. How it Works ⚙️

Telemetry Streaming: The user's vitals are streamed via WebSockets to our Node.js backend.
Clinical Rules Engine: The system continuously monitors vitals, shifting the user state between Stable, Monitor, and Critical based on mathematical thresholds.
Multimodal Consultation: Users can upload images (Vision) for triage or talk directly (Voice) to the agent.
Clinical Grounding: The agent's responses are filtered through a RAG (Retrieval-Augmented Generation) layer grounded in verified medical datasets, ensuring all advice is evidence-based.
Dossier Generation: At the end of a session, a professional PDF Medical Dossier is generated for the user to share with their physical doctor.