Inspiration
Understanding medical reports can be overwhelming for patients and even non-specialist healthcare providers. The dense language, abbreviations, and embedded imaging make it difficult to extract actionable insights. We were inspired to build an interface that demystifies these reports by combining sentence-level interaction and image interpretation powered by large language models.
What it does
MedExplain allows users to upload medical reports and interact with them sentence by sentence. It provides natural language explanations for complex medical terms, contextual clarification, and even responds to user queries about specific sections. Users can also upload accompanying medical images (e.g., X-rays or scans), and the system will generate a multimodal interpretation, linking text and image findings for better comprehension.
How we built it
Model: We used MedGemma, a vision-language model fine-tuned for medical data, to power both text and image understanding.
Backend & Deployment: We deployed the model using MLflow, ensuring reproducibility and scalable inference across modalities.
Frontend: A web interface built with Flask and HTML lets users upload documents and images, and click on individual sentences to get AI-driven explanations.
Text processing: We used document parsing (e.g., pdfplumber) and natural language techniques to segment and process the medical reports into meaningful units.
Challenges we ran into
One of the major challenges was registering a quantized version of the MedGemma model that could run efficiently on a local machine. While quantization helped reduce the model size, MLflow didn’t support registering the quantized weights directly in a straightforward manner. As a result, I had to load the full model pipeline during registration, which caused HP AI Studio to crash due to resource constraints. The workaround was to test out the MLFlow deployment locally.
Accomplishments that we're proud of
- Successfully deploying a sophisticated multimodal model using MLflow.
- Building an end-to-end system that allows both text and image interaction in a single interface.
- Enabling more transparent and accessible understanding of complex medical information.
What we learned
- Best practices for deploying models with MLflow
- MLFlow registration and deployment in HP AI studio
What's next for MedExplain
- Introducing multilingual capabilities to expand accessibility
- Integrate a second model for image segmentation, enabling users to hover over segmented regions within the medical image to receive contextual explanations, similar to how sentence-level interaction works for text. A beta version of this feature has been developed, but due to the high compute and memory requirements encountered while registering even a single model, full integration is currently on hold.

Log in or sign up for Devpost to join the conversation.