Inspiration

What it does

How we built it

Challenges we ran into

Accomplishments that we're proud of

What we learned

What's next for HealthAI

Model Card: Alzheimer's Disease MRI Classification System

Model Name: AlzheimerNet-ResNet18 Version: 1.0 Last Updated: December 20, 2025 Model Type: Convolutional Neural Network (Transfer Learning) Task: Multi-class Image Classification (Medical Imaging)


Model Details

Overview

A deep learning model for automated classification of Alzheimer's disease stages from brain MRI scans. Based on ResNet18 architecture with transfer learning from ImageNet, adapted for grayscale medical imaging.

Intended Use

  • Primary Use: Automated screening and classification support for Alzheimer's disease diagnosis
  • Target Users: Healthcare professionals, radiologists, neurologists, clinical researchers
  • Use Context: Clinical decision support tool for analyzing structural brain MRI scans
  • NOT INTENDED FOR: Standalone diagnosis without physician oversight, replacement of clinical judgment, or use by non-medical professionals

Model Architecture

  • Base Architecture: ResNet18 (He et al., 2016)
  • Pre-training: ImageNet (natural images)
  • Modifications:
    • Input adapted from RGB (3 channels) to grayscale (1 channel)
    • Output layer modified for 4-class classification
    • All layers fine-tuned on medical data
  • Parameters: ~11 million trainable parameters
  • Input: 224×224 grayscale MRI images, normalized to [-1, 1]
  • Output: Probability distribution over 4 classes (softmax)

Training Details

  • Training Data: 5,120 labeled brain MRI scans
  • Validation Data: 1,280 samples (20% holdout)
  • Test Data: 1,280 unlabeled samples
  • Data Augmentation: Rotation (±10°), horizontal flip, brightness/contrast jitter
  • Optimizer: Adam (learning rate: 1e-4, weight decay: 1e-5)
  • Training Duration: 20 epochs (~90 minutes on Tesla T4 GPU)
  • Batch Size: 32
  • Loss Function: Cross-entropy

Performance Metrics

Metric Value
Validation Accuracy 97.27%
Training Accuracy 99.19%
Precision (weighted avg) 97%
Recall (weighted avg) 97%
F1-Score (weighted avg) 97%

Per-Class Performance:

  • Class 0 (Non-Demented): 98% F1-score
  • Class 1 (Very Mild): 100% F1-score
  • Class 2 (Mild): 98% F1-score
  • Class 3 (Moderate): 96% F1-score

Limitations

Data Limitations

  1. Limited Sample Size

    • Training set of 5,120 samples is modest for deep learning
    • May not capture full variability of real-world clinical presentations
    • Impact: Reduced generalization to rare or unusual cases
  2. Class Imbalance

    • Class 1 (Very Mild) severely underrepresented (only 10 validation samples)
    • Class 2 (Mild) dominates dataset (~50% of samples)
    • Impact: Model may be less reliable for detecting very mild cases; perfect validation performance (100%) on Class 1 may not generalize
  3. Single Data Source

    • All training data appears to come from similar scanner/protocol
    • No diversity in acquisition parameters, scanner manufacturers, or imaging protocols
    • Impact: May not generalize well to different MRI scanners or imaging centers
  4. Temporal Snapshot

    • Uses single time-point imaging only
    • No longitudinal progression data
    • Impact: Cannot predict disease trajectory or rate of decline

Model Limitations

  1. Black Box Nature

    • Deep learning model lacks full interpretability
    • Difficult to explain specific predictions to clinicians
    • Impact: May reduce trust and clinical adoption
  2. Overfitting Risk

    • 99.19% training accuracy suggests potential memorization
    • Small train-val gap (2%) is positive, but vigilance needed
    • Impact: Performance may degrade on truly novel cases
  3. Binary Thresholding

    • Provides discrete class predictions rather than continuous severity scores
    • Real disease progression is continuous
    • Impact: May miss subtle transitions between stages
  4. No Uncertainty Quantification

    • Model doesn't provide confidence intervals or prediction uncertainty
    • All predictions treated equally regardless of confidence
    • Impact: Cannot flag ambiguous cases for manual review

Technical Constraints

  1. Computational Requirements

    • Requires GPU for inference (CPU too slow for clinical deployment)
    • Model size (~45MB) manageable but not edge-deployable
    • Impact: Limits deployment to well-resourced facilities
  2. Input Format Constraints

    • Requires specific image preprocessing (224×224 resize, normalization)
    • Sensitive to image quality and artifacts
    • Impact: May fail on low-quality or corrupted scans
  3. Single Modality

    • Uses structural MRI only
    • Ignores functional MRI, PET, CSF biomarkers, genetics, cognitive scores
    • Impact: Misses complementary diagnostic information

Bias & Fairness Considerations

Potential Sources of Bias

  1. Demographic Bias (Unknown)

    • Issue: Dataset demographics (age, sex, race, ethnicity, socioeconomic status) not documented
    • Risk: Model may perform differently across demographic groups
    • Example: If training data over-represents Caucasian populations, may underperform on other ethnicities
    • Mitigation Needed: Demographic analysis of training data and subgroup performance evaluation
  2. Selection Bias

    • Issue: Dataset may not represent general population (e.g., clinical trial participants vs. real-world patients)
    • Risk: Higher prevalence of severe cases or younger patients in research datasets
    • Impact: May misclassify community-dwelling, less severe cases
    • Mitigation: Validate on diverse, real-world clinical populations
  3. Scanner & Protocol Bias

    • Issue: Training data likely from limited scanner types/imaging protocols
    • Risk: Performance degradation on scans from different equipment or settings
    • Impact: Model may favor specific MRI characteristics over disease features
    • Mitigation: Multi-site validation with heterogeneous scanners
  4. Labeling Bias

    • Issue: Ground truth labels based on clinical diagnosis, which has inherent subjectivity
    • Risk: Model learns clinician biases rather than objective disease features
    • Impact: May perpetuate diagnostic disparities
    • Mitigation: Multiple expert consensus labels, neuropathological confirmation
  5. Socioeconomic Bias

    • Issue: Access to MRI scans correlates with socioeconomic status
    • Risk: Underrepresentation of lower-income populations in training data
    • Impact: May not generalize to underserved communities
    • Mitigation: Diverse data collection from community health centers

Fairness Metrics

Current Status: ⚠️ Not Evaluated

Required Analysis:

  • [ ] Stratified performance by age groups (60-70, 70-80, 80+)
  • [ ] Stratified performance by biological sex (if known)
  • [ ] Stratified performance by race/ethnicity (if known)
  • [ ] Error rate disparity across subgroups
  • [ ] False positive/negative rate parity
  • [ ] Equal opportunity metrics

Recommendation: Before clinical deployment, conduct comprehensive fairness audit with demographic-stratified evaluation.

Ethical Considerations

  1. False Positives (Type I Error)

    • Impact: Unnecessary patient anxiety, costly follow-up testing
    • Current Rate: ~3% overall (varies by class)
    • Clinical Consequence: Mild - requires confirmatory testing anyway
  2. False Negatives (Type II Error)

    • Impact: Missed early diagnosis, delayed treatment
    • Current Rate: ~3% overall (varies by class)
    • Clinical Consequence: SEVERE - early intervention critical for AD
    • Mitigation: Tune threshold to favor sensitivity over specificity if used for screening
  3. Automation Bias

    • Risk: Clinicians may over-rely on model predictions
    • Impact: Reduced clinical judgment, missed complex cases
    • Mitigation: Emphasize model as decision support, not replacement
  4. Data Privacy

    • Risk: MRI scans are protected health information (PHI)
    • Impact: HIPAA violations, patient privacy breaches
    • Mitigation: De-identification, secure storage, limited access

Interpretability

Current Interpretability: ⚠️ Limited (Black Box)

What We Can Interpret:

  1. Class Predictions

    • Model outputs clear class labels (0-3)
    • Softmax probabilities indicate relative confidence
    • Limitation: Doesn't explain why
  2. Confusion Patterns

    • Most errors between Class 2 ↔ Class 3 (adjacent stages)
    • Clinically plausible confusion (subtle differences)
    • Insight: Model learns clinically relevant feature boundaries
  3. Feature Learning (Abstract)

    • Early layers detect edges, textures (brain structure)
    • Middle layers detect anatomical patterns (ventricles, cortex)
    • Late layers detect disease signatures (atrophy, enlargement)
    • Limitation: Specific features not directly visible

What We CANNOT Interpret:

  1. Spatial Attribution

    • Which brain regions drive each prediction?
    • Are decisions based on hippocampus, cortex, ventricles, or multiple areas?
    • Missing: Saliency maps, attention weights, GradCAM visualizations
  2. Decision Boundaries

    • What specific features distinguish Class 2 from Class 3?
    • How much atrophy is "enough" for severe classification?
    • Missing: Feature importance scores, counterfactual examples
  3. Individual Predictions

    • Why was this specific patient classified as Class 3?
    • Missing: Case-by-case explanations

Recommended Interpretability Enhancements:

High Priority:

  1. GradCAM/GradCAM++ - Highlight influential brain regions
  2. Attention Mechanisms - Built-in interpretability through attention weights
  3. Saliency Maps - Pixel-level importance visualization

Medium Priority:

  1. Feature Visualization - Show what specific neurons detect
  2. Layer-wise Relevance Propagation (LRP) - Trace predictions back to inputs
  3. SHAP Values - Local feature importance

Low Priority (Research):

  1. Concept Activation Vectors - High-level semantic concepts
  2. Prototypical Examples - Show similar training cases

Clinical Interpretability Requirements:

For clinical adoption, we need to provide:

  • ✅ Prediction confidence scores (currently available via softmax)
  • ❌ Brain region heatmaps (NOT IMPLEMENTED)
  • ❌ Comparison to "typical" cases (NOT IMPLEMENTED)
  • ❌ Uncertainty quantification (NOT IMPLEMENTED)
  • ❌ Explanation of decision (NOT IMPLEMENTED)

Status: Model currently unsuitable for clinical deployment without interpretability enhancements.


Out-of-Scope Use Cases

Explicitly NOT INTENDED FOR:

  1. ❌ Standalone Clinical Diagnosis

    • Model must be used as decision support ONLY
    • Requires confirmation by qualified healthcare professionals
    • Not a replacement for comprehensive clinical evaluation
  2. ❌ Predictive Prognosis

    • Cannot predict future disease progression or survival
    • Not trained on longitudinal outcome data
  3. ❌ Treatment Recommendation

    • Does not suggest specific treatments or interventions
    • Clinical management decisions require physician expertise
  4. ❌ Non-MRI Modalities

    • Trained exclusively on structural MRI
    • Will fail on CT, PET, ultrasound, or X-ray images
  5. ❌ Pediatric or Non-AD Dementia

    • Trained on adult Alzheimer's disease only
    • Not applicable to frontotemporal dementia, Lewy body dementia, vascular dementia, etc.
  6. ❌ Real-Time Critical Decisions

    • Not validated for emergency or time-sensitive scenarios
    • Requires proper quality control and validation
  7. ❌ Consumer/Direct-to-Patient Use

    • Requires medical expertise to interpret
    • Not designed for self-diagnosis

Caveats & Recommendations

Deployment Considerations

  1. Regulatory Approval Required

    • Not FDA-cleared or CE-marked
    • Requires validation for medical device classification
    • Must comply with local healthcare regulations
  2. Clinical Validation Needed

    • External validation on independent datasets
    • Prospective clinical trial to assess real-world performance
    • Comparison to radiologist performance
  3. Quality Control

    • Implement input validation (image quality checks)
    • Monitor prediction drift over time
    • Regular re-validation as new data emerges
  4. Human Oversight Mandatory

    • All predictions require physician review
    • System should flag uncertain predictions
    • Maintain audit trail of predictions vs. final diagnoses

Safe Use Guidelines

DO:

  • ✅ Use as screening tool to prioritize cases
  • ✅ Validate predictions with clinical assessment
  • ✅ Monitor performance on your local population
  • ✅ Retrain periodically with new data
  • ✅ Document all model decisions

DON'T:

  • ❌ Use without physician oversight
  • ❌ Apply to populations not represented in training data
  • ❌ Ignore model uncertainty or low confidence predictions
  • ❌ Deploy without local validation
  • ❌ Use for legal or financial decisions

Model Versioning & Updates

Current Version: 1.0 (Baseline)

  • Release Date: December 20, 2025
  • Training Data Version: Kaggle MRI Alzheimer's Dataset (Dec 2025)
  • Performance: 97.27% validation accuracy

Planned Updates:

Version 1.1 (Proposed - Q1 2026)

  • Implement GradCAM interpretability
  • Add uncertainty quantification
  • Address Class 1 imbalance with synthetic augmentation

Version 2.0 (Proposed - Q2 2026)

  • Multi-site validation
  • Ensemble model for improved robustness
  • Demographic fairness audit and mitigation

Contact & Feedback

Model Developers: [Your Name/Team] Institution/Organization: [Your Organization] Email: [Contact Email] Issues & Feedback: [GitHub Issues / Email]

Reporting Errors or Concerns

If you encounter:

  • Unexpected predictions or errors
  • Bias or fairness issues
  • Safety concerns
  • Technical bugs

Please contact us immediately with:

  • Anonymized case details
  • Input image characteristics
  • Expected vs. actual output
  • Your use context

Acknowledgments

  • AI for Alzheimer's Hackathon organizers
  • Dataset providers and contributors
  • Open-source PyTorch and torchvision communities
  • Medical imaging research community

License & Terms of Use

License: [To Be Determined - specify open-source or proprietary]

Terms:

  • Research and educational use permitted
  • Clinical use requires additional validation and regulatory approval
  • Commercial use requires separate licensing agreement
  • No warranties provided - use at your own risk
  • Users assume all liability for clinical decisions

Changelog

Version 1.0 (December 20, 2025)

  • Initial release
  • ResNet18 baseline model
  • 97.27% validation accuracy
  • 4-class Alzheimer's classification
  • Known limitations documented

This model card follows guidelines from Mitchell et al. (2019) "Model Cards for Model Reporting" and the EU AI Act technical documentation requirements.

Last Updated: December 20, 2025 Next Review: March 20, 2026 (quarterly review)

Built With

Share this project:

Updates