MediFusion AI — Generative AI Voice-Driven Medical Assistant Powered by ElevenLabs + Multimodal AI

Inspiration

In many regions, patients face long waiting times and difficulty accessing timely medical guidance. We wanted to create a voice-first intelligent healthcare assistant that enables natural, stress-free medical interaction. MediFusion AI bridges the gap between patients and healthcare support using real-time speech understanding, AI reasoning, and disease prediction.

What it does

MediFusion AI is an AI Voice Doctor that lets users speak naturally and receive intelligent, context-aware, medically relevant responses. It:

  • Listens to patient speech using Whisper + ElevenLabs STT with noise reduction
  • Understands symptoms and medical context using multimodal reasoning models
  • Responds like a real doctor using ElevenLabs TTS for natural medical voice output
  • Predicts diseases using ML models for Diabetes, Tumor Detection, and Heart Disease
  • Delivers a fully voice-based consultation through a Gradio conversational interface

Key Features

  • Fully voice-driven healthcare consultation
  • Real-time voice + text understanding
  • Medical image interpretation using Vision-enabled LLMs
  • Disease prediction engine (Diabetes, Tumor, Heart Disease)
  • Natural human-like doctor voice with ElevenLabs TTS
  • Simple and responsive UI with Gradio

Tech Stack

Category Tools
Voice AI ElevenLabs STT & TTS, Whisper, gTTS
AI & Reasoning LLaMA 3 Vision / Multimodal AI
Frontend Gradio
ML Models Diabetes, Heart Disease, Tumor CNN
Deployment Hugging Face / Streamlit
Languages & Tools Python, NumPy, Pandas, OpenCV, ONNX

How we built it

Phase 1 — AI Brain

  • Integrated multimodal LLM for reasoning and medical analysis

Phase 2 — Voice of the Patient

  • Real-time voice recording + speech transcription (Whisper + ElevenLabs STT)

Phase 3 — AI Doctor Voice

  • ElevenLabs neural TTS for natural doctor-style responses
  • GTTS for additional speech synthesis

Phase 4 — Disease Prediction Engine

  • ML & DL models for Diabetes, Heart Disease, and Tumor MRI detection

Phase 5 — Real-Time UI

  • Gradio VoiceBot interface for complete speech interaction

Challenges

  • Latency management between STT, reasoning, and TTS
  • Improving accuracy with noisy audio inputs
  • Syncing ML prediction and interactive speech responses

Accomplishments

  • Fully voice-based intelligent medical assistance
  • Highly natural doctor-like voice via ElevenLabs
  • Real-time disease prediction integration

What we learned

  • Multimodal real-time conversational systems
  • Voice technology and low-latency inference optimization
  • Deploying scalable AI assistants

What’s next

  • Multilingual voice consultation
  • Patient history & digital medical report summarization
  • Mobile-first app version

What Makes MediFusion AI Unique

MediFusion AI delivers a fully conversational healthcare experience using advanced voice technologies from ElevenLabs combined with multimodal AI reasoning. Users interact entirely through speech, making medical assistance fast, accessible, and natural. The system understands symptoms, provides guidance, and responds in real time—creating a seamless, human-like healthcare interaction for everyone.


Submission Link

🔗 GitHub Repository:
https://github.com/amit-sharma-ds/GenAIHeathcare

Built With

Share this project:

Updates