MediFusion AI — Generative AI Voice-Driven Medical Assistant Powered by ElevenLabs + Multimodal AI
Inspiration
In many regions, patients face long waiting times and difficulty accessing timely medical guidance. We wanted to create a voice-first intelligent healthcare assistant that enables natural, stress-free medical interaction. MediFusion AI bridges the gap between patients and healthcare support using real-time speech understanding, AI reasoning, and disease prediction.
What it does
MediFusion AI is an AI Voice Doctor that lets users speak naturally and receive intelligent, context-aware, medically relevant responses. It:
- Listens to patient speech using Whisper + ElevenLabs STT with noise reduction
- Understands symptoms and medical context using multimodal reasoning models
- Responds like a real doctor using ElevenLabs TTS for natural medical voice output
- Predicts diseases using ML models for Diabetes, Tumor Detection, and Heart Disease
- Delivers a fully voice-based consultation through a Gradio conversational interface
Key Features
- Fully voice-driven healthcare consultation
- Real-time voice + text understanding
- Medical image interpretation using Vision-enabled LLMs
- Disease prediction engine (Diabetes, Tumor, Heart Disease)
- Natural human-like doctor voice with ElevenLabs TTS
- Simple and responsive UI with Gradio
Tech Stack
| Category | Tools |
|---|---|
| Voice AI | ElevenLabs STT & TTS, Whisper, gTTS |
| AI & Reasoning | LLaMA 3 Vision / Multimodal AI |
| Frontend | Gradio |
| ML Models | Diabetes, Heart Disease, Tumor CNN |
| Deployment | Hugging Face / Streamlit |
| Languages & Tools | Python, NumPy, Pandas, OpenCV, ONNX |
How we built it
Phase 1 — AI Brain
- Integrated multimodal LLM for reasoning and medical analysis
Phase 2 — Voice of the Patient
- Real-time voice recording + speech transcription (Whisper + ElevenLabs STT)
Phase 3 — AI Doctor Voice
- ElevenLabs neural TTS for natural doctor-style responses
- GTTS for additional speech synthesis
Phase 4 — Disease Prediction Engine
- ML & DL models for Diabetes, Heart Disease, and Tumor MRI detection
Phase 5 — Real-Time UI
- Gradio VoiceBot interface for complete speech interaction
Challenges
- Latency management between STT, reasoning, and TTS
- Improving accuracy with noisy audio inputs
- Syncing ML prediction and interactive speech responses
Accomplishments
- Fully voice-based intelligent medical assistance
- Highly natural doctor-like voice via ElevenLabs
- Real-time disease prediction integration
What we learned
- Multimodal real-time conversational systems
- Voice technology and low-latency inference optimization
- Deploying scalable AI assistants
What’s next
- Multilingual voice consultation
- Patient history & digital medical report summarization
- Mobile-first app version
What Makes MediFusion AI Unique
MediFusion AI delivers a fully conversational healthcare experience using advanced voice technologies from ElevenLabs combined with multimodal AI reasoning. Users interact entirely through speech, making medical assistance fast, accessible, and natural. The system understands symptoms, provides guidance, and responds in real time—creating a seamless, human-like healthcare interaction for everyone.
Submission Link
🔗 GitHub Repository:
https://github.com/amit-sharma-ds/GenAIHeathcare
Built With
- deep-learning
- docker
- elevenlabs
- ffmpeg
- gradio
- groq
- groq-cloud
- gtts
- llama-3-vision
- machine-learning
- openai-whisper
- pyaudio
- python
- speech-to-text
- vs-code
- x



Log in or sign up for Devpost to join the conversation.