Diabetic Retinopathy Stage Prediction

Inspiration

Diabetic Retinopathy is one of the leading causes of preventable blindness globally, affecting millions of diabetic patients who often don't realize their vision is at risk until it's too late. Early and accurate detection can save sight but access to Specialist diagnosis is limited in many regions. We wanted to build an accessible, automated screening tool that puts early detection within reach of anyone with a retinal fundus image.

What We Learned

How to fine-tune Swin Transformers for medical image classification
Techniques for handling severe class imbalance in clinical datasets (WeightedRandomSampler + Focal Loss)
The importance of contrast enhancement and augmentation for retinal images, where subtle visual differences between severity stages are clinically significant
End-to-end ML deployment via Hugging Face Spaces

How We Built It

We trained a Swin Transformer (pretrained on ImageNet, fine-tuned on retinal data) to classify fundus images into 5 DR severity levels:

Label	Stage
0	No DR
1	Mild
2	Moderate
3	Severe
4	Proliferative DR

Dataset: APTOS 2019 Blindness Detection (Kaggle)

Preprocessing & Augmentation:

Image resizing and normalization
Contrast enhancement (CLAHE-style)
Random geometric transforms
ColorJitter for color-space robustness

Training Setup:

Transfer learning with pretrained Swin weights
Focal Loss to penalize misclassification of minority classes
WeightedRandomSampler for balanced mini-batches
Adaptive learning rate scheduling
15 epochs of fine-tuning

Result: 87% overall accuracy across all 5 severity classes

The model was then deployed as an interactive web app on Hugging Face Spaces, where users can upload any retinal fundus image and receive an instant prediction.

🔗 Try it live

Challenges We Faced

Class imbalance: The APTOS dataset is heavily skewed toward the "No DR" and "Moderate" classes. Naive training produced a model biased toward majority classes. We resolved this with Focal Loss and weighted sampling.
Subtle inter-class differences: Stages 1–3 are visually similar, making fine-grained classification difficult even for clinicians. Careful augmentation and contrast preprocessing helped the model learn discriminative features.
Training stability: Fine-tuning transformers on small medical datasets risks overfitting. We used adaptive LR scheduling and monitored validation metrics closely across all 15 epochs.

What's Next

Grad-CAM visualizations to highlight the retinal regions driving each prediction
Experiment with EfficientNet and larger ViT variants for comparison
Expand dataset with additional public DR datasets for better generalization
Optimize inference latency for mobile/edge deployment

Built With

Updates

Mahrukh Haseeb started this project — Mar 14, 2026 05:23 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.