Inspiration

Many people find it difficult to explain their symptoms by typing everything manually. In stressful situations, speaking or sharing a medical image is much easier. Most healthcare tools also lack memory and cannot track a patient's health over time.

We wanted to build an AI assistant that feels more like a real healthcare helper — one that understands voice, images, and text, remembers past interactions, and helps users monitor their health.

What it does

Our project is a multimodal healthcare AI agent that allows users to interact using speech, chat, or medical images. The system simulates a structured medical consultation and provides intelligent responses similar to a doctor.

Users can:

• Speak symptoms or type questions to the AI doctor • Upload medical images for visual analysis • Receive doctor-like responses with realistic voice output • Generate structured medical reports • Save and access previous consultation reports

We also introduced a Health Dashboard where users can upload medical reports such as blood test reports. The system automatically extracts key medical parameters like:

Blood Pressure Blood Sugar Cholesterol

It then classifies the severity (Normal, Warning, High Risk) and stores these metrics over time. The dashboard visualizes health trends through graphs, helping users monitor their condition and detect potential health risks early.

The system also includes an emergency assistance agent that detects urgent situations and recommends nearby hospitals with directions and contact information.

Additionally, the platform now supports multiple languages, allowing users to receive consultation responses and voice output in their preferred language, making the system more accessible for regional users.

How we built it

The system uses an agent-based architecture. Speech is transcribed using Groq, and medical images and reports are analyzed using a multimodal vision-language model.

To improve reliability, we integrated WHO guideline–based Retrieval Augmented Generation (RAG) so responses are grounded in medical knowledge.

Health metrics extracted from reports are stored in a database and visualized in the dashboard. For emergencies, a dedicated agent uses the Google Maps API to locate nearby hospitals. We also integrated ElevenLabs for natural doctor voice responses and built the interface using Gradio.

Challenges we ran into

Handling multiple input types in one workflow was challenging. We had to design a reliable intent routing system, extract structured data from different medical report formats, and maintain conversation memory while coordinating multiple AI agents.

Accomplishments that we're proud of

We’re proud that the project evolved beyond a simple chatbot into a complete AI-powered healthcare workflow.

Key features we successfully implemented include:

• Multimodal consultation with speech, chat, and image inputs • AI-generated structured medical reports • Persistent report history and user dashboard • Automated extraction of health metrics from uploaded reports • Severity classification for blood pressure, blood sugar, and cholesterol • Health trend visualization through interactive graphs • Emergency hospital discovery and navigation support • Multilingual responses with realistic doctor voice output

These capabilities make the system feel more like a practical healthcare assistant rather than just a prototype.

What we learned

Through this project we learned how to design AI agent workflows, integrate multimodal AI systems, build retrieval-based medical reasoning pipelines, and structure LLM outputs for real-world applications.

We also gained experience connecting AI models with backend systems, APIs, databases, and user interfaces to create a full end-to-end product.

What's next for Multimodal Healthcare AI Agent with Emergency Assistance

Moving forward, we plan to expand the system with more advanced healthcare features.

Future improvements include:

• Advanced patient severity scoring and risk prediction • A doctor dashboard for monitoring multiple patients • Support for additional regional languages • Integration with wearable health devices for real-time monitoring • Offline-capable versions for rural healthcare environments • Expanded medical knowledge integration to further align responses with WHO clinical guidelines

Our long-term goal is to create a scalable AI healthcare assistant that improves accessibility, early detection, and patient awareness worldwide.

Built With

Share this project:

Updates

Private user

Private user posted an update

Added a Health Monitoring Dashboard to the Multimodal Healthcare AI Agent.

Users can now upload medical reports and the system automatically extracts key health metrics such as Blood Pressure, Blood Sugar, and Cholesterol. The AI classifies severity levels and stores the data to help track health changes over time.

The dashboard also visualizes health trends through graphs, allowing users to monitor their condition across multiple reports. This turns the system from just a consultation assistant into a personal health monitoring tool.

Log in or sign up for Devpost to join the conversation.

Private user

Private user posted an update

Added multilingual support to the Multimodal Healthcare AI Agent. Users can now select their preferred language (English, Hindi, Tamil, Telugu, Odia, Bengali, Kannada, Malayalam, Marathi, Gujarati) before starting a consultation.

The AI doctor now generates both text responses and voice output in the selected language, making the system easier to use for people who are more comfortable communicating in regional languages. This helps reduce language barriers in healthcare access.

This update also improves the system’s real-world usability, especially for users in rural or non-English speaking regions who may struggle to explain symptoms in English.

Log in or sign up for Devpost to join the conversation.

Private user

Private user posted an update

Integrated WHO-based Retrieval Augmented Generation (RAG) into the Multimodal Healthcare AI Agent to ground responses with official emergency care guidelines. This upgrade significantly reduces hallucinations and enhances reliability in critical, real-time medical assistance scenarios.

Log in or sign up for Devpost to join the conversation.

Private user

Private user posted an update

Project Update:

I focused on building the core flow of the Multimodal Healthcare AI Agent.

I implemented voice input so users can describe symptoms naturally instead of typing. Added medical image support where the system analyzes uploaded images using a multimodal model and generates doctor-style explanations.

Conversation memory is now working — the assistant remembers previous context which makes follow-up questions feel like a real consultation.

I also completed structured medical report generation with report history storage, allowing users to revisit previous consultations.

Recently integrated the emergency agent that detects critical symptoms and shows nearby hospitals with directions, making the system more actionable in urgent situations.

Currently improving UI clarity, reasoning accuracy, and workflow optimization before final submission.

Log in or sign up for Devpost to join the conversation.