ViT4Alzheimers

HuggingFace Model Card
Training and Validation Graphs
Training Logs
Confusion Matrix for Predictions

Why this approach is Unique and Revolutionary

Vision Transformers are revolutionary because their global self-attention captures whole-brain structural patterns simultaneously, unlike CNNs’ local receptive fields, enabling superior modeling of diffuse neurodegeneration and robustness under extreme class imbalance in medical imaging.

How to Use

Upload the train.parquet and test.parquet files (provided by the hackathon organizing team) to Google Colab.
Then, run the notebook cells sequentially, so no additional configuration required.

Inspiration

Alzheimer’s disease affects over 55 million people globally, yet early detection remains a major challenge due to subjective clinical assessments and limited access to expert neuroimaging interpretation.

I was inspired to leverage Vision Transformers (ViTs), which have demonstrated state-of-the-art performance in natural image recognition to automate brain MRI analysis.
The most critical challenge was handling an extreme 26:1 class imbalance, where traditional CNN-based approaches tend to fail, making this an ideal opportunity to showcase ViT’s superiority in medical imaging.

What It Does

ViT4Alzheimer is an automated Alzheimer’s disease detection system that analyzes brain MRI scans and classifies disease stages with 94.53% accuracy.

Key highlights:

Handles extreme class imbalance (26:1 ratio) using weighted cross-entropy loss
Achieves a perfect F1-score of 1.0 on the minority class, despite it representing only ~1% of the data
Provides fast inference along with confidence scores for clinical decision support
Designed to be deployment-ready for real-world hospital integration

How I Built It

I fine-tuned Google’s ViT-Large (304M parameters) by selectively unfreezing the last 6 transformer layers to enable medical domain adaptation.

Core technical components:

Weighted loss function with a 26.12× weight for the minority class
Mixed-precision FP16 training combined with gradient accumulation to fit GPU memory constraints
Cosine annealing learning rate scheduler for stable convergence

The model was trained on 5,120 brain MRI scans using PyTorch.

Challenges I Ran Into

The severe 26:1 class imbalance initially caused the model to collapse into majority-class predictions
Minority-class accuracy and F1-score were poor until applying 26.12× weighted loss, which ultimately achieved F1 = 1.0
T4 GPU memory limitations required careful optimization using FP16 mixed-precision and gradient accumulation
Finding the right balance between model capacity and training time required systematic experimentation across three configurations, with the 6-layer unfreezing strategy proving optimal
I experimented with data augmentation, but performance degraded, likely because medical MRI scans are highly sensitive to artificial transformations
- ➜ As a result, no data augmentation was used in the final submission

Accomplishments I’m Proud Of

Achieving perfect F1 = 1.0 on the minority class, with only 49 training samples (~1%), which is rare in medical AI under extreme imbalance
Outperforming ensemble CNN models using a single Vision Transformer
Conducting a systematic 3-configuration study, producing actionable insights for real-world deployment
Releasing the model as a fully open-source solution on HuggingFace, enabling adoption by resource-constrained hospitals worldwide

What I Learned

Vision Transformers’ global self-attention is fundamentally superior to CNNs’ local receptive fields for capturing whole-brain atrophy patterns
Weighted loss functions are essential for extreme class-imbalance scenarios in medical datasets
Transfer learning from 14M+ natural images significantly reduces overfitting, even with relatively small medical datasets (~5,000 samples)
Architectural decisions (e.g., how many layers to unfreeze) have a larger impact than brute-force scaling

What’s Next for ViT4Alzheimer

Immediate goals

Revisit carefully designed medical data augmentation strategies to further boost performance
Deploy a pilot study at a partner hospital for prospective clinical validation

Mid-term scaling

Expand the dataset via multi-center collaborations
- Target: 50,000+ MRI scans from 10+ institutions

Technical roadmap

Multimodal fusion (T1/T2 MRI + PET + clinical metadata)
Attention visualization for clinician interpretability
Federated learning for privacy-preserving collaboration (GDPR/HIPAA compliant)

Long-term vision

Longitudinal disease progression modeling (1 to 5 year predictions)
Experiment with ViT-Huge (632M parameters) for potential +1 to 2% accuracy gains
Expand the framework to other neurodegenerative diseases such as Parkinson’s and Huntington’s disease

Built With

huggingface
python
pytorch
vision-transformers

Updates

Ishaan Pandey posted an update — Jan 01, 2026 03:30 AM EST

Hello Judges,

In the "try it out" section, I have uploaded the "test.parquet" predictions (test dataset predictions) under my google drive, so please check it out if you wish to evaluate the model on your own

Thank You

Log in or sign up for Devpost to join the conversation.

Ishaan Pandey started this project — Jan 01, 2026 02:40 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.