Inspiration
Medical educators and researchers often struggle to access high-quality diagnostic images for rare conditions, constrained by patient privacy and limited datasets. We envisioned a tool that could generate medically accurate visuals from text descriptions—giving students, clinicians, and researchers a new way to study pathology without relying on sensitive data.
What it does
MedDream transforms textual medical descriptions into diagnostic-quality images using fine-tuned generative AI. Users can input anatomical findings, symptoms, or diagnoses, and the system produces images that reflect the medical condition described. This enables training, simulation, and education without using patient-derived imagery.
How we built it
We fine-tuned the Stable Diffusion v1.5 model using the ROCO (Radiology Objects in COntext) dataset with anatomical captions. The training was conducted on HP AI Studio, utilizing the diffusers library from Hugging Face. We formatted the dataset to match training requirements and used LoRA (Low-Rank Adaptation) to enable efficient training. All training, testing, and validation were done in a reproducible, containerized environment with GPU acceleration.
Challenges we ran into
Navigating package conflicts within AI Studio's virtual environments.
Parsing and converting the ROCO dataset into a compatible format for training.
Debugging subtle issues in the data loading pipeline and Hugging Face’s training scripts.
Lack of direct documentation on medical image generation, requiring extensive experimentation.
Accomplishments that we're proud of
Successfully completed training a working text-to-image pipeline for medical image generation.
Created a reproducible workflow within HP AI Studio, from dataset prep to model export.
Generated meaningful medical imagery from plain-text inputs with no real patient data.
Overcame low-level library and environment issues that often block reproducibility.
What we learned
The importance of proper data formatting when training with Hugging Face Datasets and Diffusers.
Best practices for LoRA training to minimize compute time while retaining performance.
How to adapt general-purpose text-to-image pipelines for a domain-specific application like medicine.
Troubleshooting PyTorch and environment-level conflicts in managed cloud platforms.
What's next for MedDream: AI-Powered Visualizations for Medical Text
User Interface: Building a simple web-based frontend for medical professionals and students.
Multimodal Input: Allowing combined image + text prompts for more specific generation.
Clinical Fine-tuning: Adding pathology-specific datasets to improve realism and accuracy.
Deployment: Wrapping the model into an inference API for wider access, potentially integrating with educational platforms and medical training simulations.
Built With
- hp-ai-studio

Log in or sign up for Devpost to join the conversation.