Inspiration
Prostate cancer screening starts with a blood test. If the numbers look off, the patient gets an MRI. If the MRI looks suspicious, they get a biopsy (a needle procedure that's painful, stressful, and carries real medical risk). The hard part is that a lot of those biopsies come back negative, according to the NIH, up to 52% of prostate biopsies do not detect cancer. This is not because the radiologist made a mistake, but because reading prostate MRI consistently is genuinely difficult. Two experienced radiologists can look at the same scan and score it differently.
We came across the VERDICT MRI study by Singh et al. (2022), which reported that prostate MRI still creates a significant false-positive burden in clinical practice. That stuck with us. We wanted to explore whether using Deep Learning could help make that review process more consistent, not by replacing the radiologist, but by giving them a clearer, more traceable starting point.
What We Built
A Gradio web app that takes in prostate MRI files and returns a PI-RADS-style risk score.
PI-RADS is the standard 1-5 scoring system radiologists use to describe how suspicious a prostate MRI looks:
- PI-RADS 1: very unlikely to be clinically significant cancer
- PI-RADS 2: unlikely
- PI-RADS 3: equivocal / borderline
- PI-RADS 4: likely
- PI-RADS 5: very likely
The app accepts three MRI image types, T2 (structural anatomy), ADC (water diffusion), and B1500 (diffusion-weighted), groups matching slices from a patient scan, runs each slice through the model, and selects the highest-risk slice to produce a patient-level score. The output includes the predicted PI-RADS category, model confidence, a full probability table across all 5 classes, and a slice-level summary table showing exactly which files drove the prediction.
How We Built It
The model is a convolutional variational autoencoder (VAE) with a deterministic classifier head. The architecture looks like this:
$$\text{MRI slice} \xrightarrow{\text{CNN encoder}} (\mu, \log\sigma^2) \xrightarrow{\text{sample } z} \text{reconstructed slice}$$
$$\mu \xrightarrow{\text{classifier}} \text{PI-RADS prediction}$$
The key design choice was making the model do two jobs simultaneously: reconstruct the MRI slice from a compressed latent representation, and predict the PI-RADS score from that same representation. We use $\mu$ (the stable mean of the latent distribution) for classification rather than a sampled $z$, which keeps predictions deterministic at inference time.
The training loss combines 4 terms:
L = L_recon + β·D_KL + L_class + λ·L_consistency
Where L_consistency is a small regularization penalty that encourages the ADC and B1500 channels to remain internally coherent within the latent space.
We trained on the NYU fastMRI prostate dataset (access requires accepting the NYU fastMRI Dataset Sharing Agreement, raw data is not included in this repository).
The frontend is built with Gradio, the preprocessing pipeline handles DICOM and PNG/JPG inputs, and the app is structured to run locally or deploy directly to Hugging Face Spaces.
Challenges
Getting multi-modal slice alignment right: Prostate MRI exams contain T2, ADC, and B1500 sequences as separate files. Matching the correct slices across 3 modalities, especially when file naming is inconsistent, required building a preprocessing pipeline that checks both folder structure and DICOM metadata before grouping slices.
The two-loss balancing act: Training a VAE to simultaneously reconstruct images and classify PI-RADS scores meant tuning the weighting between reconstruction loss, KL divergence, and classification loss carefully. Too much weight on reconstruction and the classifier wouldn't converge. Too much on classification and the latent space would collapse.
Responsible framing: This is a medical AI project, and we were deliberate about not overstating what it does. Building in confidence scores, probability tables, slice-level traceability, and clear disclaimers took real thought, it's easy to ship a number, harder to ship something a clinician could actually reason about.
What We Learned
- How prostate MRI is structured clinically and why multi-modal imaging matters for cancer detection
- The tradeoffs between generative and discriminative objectives when training a VAE with a classifier head
- How to design AI output for a high-stakes domain where explainability isn't optional
What's Next
With access to a larger labeled prostate MRI dataset, the next step would be training on real PI-RADS annotations from radiologists and validating performance against clinical reads. The app architecture is already set up for that, the preprocessing pipeline, model interface, and output format are all in place.
GitHub Link: https://github.com/sabya-chow/prostate-pirads-vae
Live app: https://huggingface.co/spaces/sabyachow/prostate-pirads-vae
Youtube Link: https://youtu.be/KXggjb2rBts
Log in or sign up for Devpost to join the conversation.