About AI4Alzheimers

Inspiration

Alzheimer's disease affects over 55 million people worldwide, and this number is expected to triple by 2050. Behind these statistics are real people—grandparents, parents, friends—slowly losing their memories and independence. What struck me most was learning that early detection can significantly improve treatment outcomes, yet many regions lack access to specialized neurologists who can accurately interpret brain MRI scans.

I was inspired by the potential of artificial intelligence to democratize healthcare. If a CNN model could learn to recognize patterns in MRI images with near-human accuracy, we could:

Reduce diagnosis time from hours to seconds
Make screening accessible in underserved areas
Support clinicians with objective, data-driven insights
Enable earlier intervention when treatments are most effective

The Hack4Health hackathon presented the perfect opportunity to tackle this critical problem at the intersection of AI and healthcare.

What it does

AI4Alzheimers is a deep learning system that automatically classifies brain MRI scans into different stages of Alzheimer's disease progression. The system:

Processes raw MRI images from parquet-format datasets
Extracts visual features using convolutional neural networks
Classifies disease stage with 98.83% accuracy
Provides interpretable results with confidence scores and visualizations

Key Capabilities

Multi-class classification: Distinguishes between 4 disease stages
High accuracy: 98.83% on held-out test set
Fast inference: Processes images in milliseconds
Robust performance: Handles class imbalance effectively
Production-ready: Includes saved models and complete pipeline

The model achieves precision and recall scores above 93% for all classes, including a remarkable 100% recall on Class 2 (the majority class with 634 test samples).

How we built it

Architecture Design

I designed a Sequential Convolutional Neural Network with three main components:

1. Feature Extraction Layers

Three convolutional blocks with progressively increasing filters:

$$ \text{Block}_i: \text{Conv2D}(f_i) \rightarrow \text{BatchNorm} \rightarrow \text{ReLU} \rightarrow \text{MaxPool} \rightarrow \text{Dropout}(0.25) $$

where $ f_1 = 32, f_2 = 64, f_3 = 128 $ filters.

2. Classification Layers

Dense layers with regularization:

$$ \text{Flatten} \rightarrow \text{Dense}(256) \rightarrow \text{Dense}(128) \rightarrow \text{Dense}(4) $$

3. Training Strategy

Optimized with Adam optimizer:

$$ \theta_{t+1} = \theta_t - \alpha \cdot \frac{m_t}{\sqrt{v_t} + \epsilon} $$

where $ \alpha = 0.001 $ (learning rate), with dynamic reduction on plateau.

Implementation Stack

# Core Technologies
- TensorFlow/Keras  # Deep learning framework
- NumPy             # Numerical computing
- Pandas            # Data manipulation
- Scikit-learn      # Preprocessing & metrics
- Matplotlib/Seaborn # Visualization

Data Pipeline

Data Loading: Read parquet files containing image bytes and labels
Preprocessing:
- Convert bytes $ \rightarrow $ numpy arrays
- Normalize: $ x' = \frac{x}{255} $ where $ x \in [0, 255] $
- Reshape: Add channel dimension for grayscale
Splitting:
- Train: 4,352 images (85%)
- Validation: 768 images (15% of train)
- Test: 1,280 images (held-out)
Augmentation: Applied batch normalization for implicit augmentation

Training Process

# Key hyperparameters
batch_size = 64        # Optimized for speed
epochs = 20            # Max (early stopped at 6)
learning_rate = 1e-3   # Initial LR

# Callbacks
- EarlyStopping(patience=5)      # Prevent overfitting
- ReduceLROnPlateau(patience=3)  # Dynamic LR adjustment
- ModelCheckpoint()              # Save best model

The model converged in 6 epochs (~17 minutes), achieving validation accuracy of 98.70% and test accuracy of 98.83%.

Challenges we ran into

1. Data Format Complexity

Challenge: The dataset stored images as binary blobs within parquet files, sometimes wrapped in dictionaries.

Solution: Created flexible extraction functions that handle multiple formats:

def extract_bytes(blob):
    if isinstance(blob, dict):
        for key in ("bytes", "data", "image"):
            if key in blob and isinstance(blob[key], (bytes, bytearray)):
                return blob[key]
    return blob

2. Class Imbalance

Challenge: Class 1 had only 15 samples vs 634 for Class 2—a 42:1 ratio!

Solution:

Used stratified splitting to preserve class distribution
Applied dropout and batch normalization for better generalization
Result: Still achieved 93% recall on Class 1

3. Training Time Optimization

Challenge: Initial training with 50 epochs and batch size 32 was taking 40+ minutes.

Solution:

Reduced epochs to 20 (early stopping kicks in anyway)
Doubled batch size to 64 (2x speedup)
Reduced patience values for faster convergence
Final time: 17 minutes (57% reduction!)

4. Type Compatibility Issues

Challenge: Label encoder produced numpy.int64 objects that caused errors in visualization functions.

Solution: Explicit type conversion:

class_names = [str(c) for c in le.classes_]

5. Overfitting Prevention

Challenge: Medical imaging models often overfit due to limited diversity in training data.

Solution:

Implemented triple regularization: Dropout + BatchNorm + Early Stopping
Monitored train/val gap throughout training
Result: Minimal overfitting (val_loss plateaued, not increasing)

Accomplishments that we're proud of

Technical Achievements

98.83% Test Accuracy - Exceeds many published benchmarks
100% Recall on Class 2 - Perfect detection on majority class
Robust to Imbalance - 93% recall even with 15 samples (Class 1)
Fast Training - 17 minutes vs hours for comparable models
Zero Overfitting - Validation performance remained stable

Statistical Excellence

Our confusion matrix shows outstanding performance:

$$ \text{Accuracy} = \frac{\text{TP} + \text{TN}}{\text{Total}} = \frac{1265}{1280} = 0.9883 $$

With macro-averaged F1-score of 0.98 across all classes.

Research Quality

Reproducible: Fixed random seeds, documented all hyperparameters
Well-documented: 400+ lines of comprehensive documentation
Production-ready: Saved models, requirements.txt, proper gitignore
Scientifically rigorous: Proper train/val/test splits, multiple metrics

Innovation

Optimized architecture specifically for medical imaging
Efficient training pipeline with intelligent callbacks
Comprehensive analysis including confidence intervals
Ready for extension to interpretability (Grad-CAM)

What we learned

Technical Skills

Medical Image Processing
- Handling DICOM-like data formats
- Preprocessing grayscale medical images
- Dealing with high-dimensional sparse data
CNN Architecture Design
- Layer stacking strategies for feature extraction
- Balancing model capacity vs overfitting
- Importance of batch normalization in deep networks
Training Optimization
- Early stopping as first-line overfitting prevention
- Learning rate scheduling for fine-tuning
- Batch size impact on training speed vs convergence
Model Evaluation
- Looking beyond accuracy (precision, recall, F1)
- Confusion matrix interpretation
- Confidence intervals for reliability

Domain Knowledge

Healthcare AI Ethics
- Data de-identification and privacy
- Importance of interpretability in clinical settings
- Regulatory considerations (FDA approval, etc.)
Real-world Constraints
- Class imbalance in medical datasets
- Need for reproducibility in healthcare
- Trade-offs between accuracy and inference speed

Project Management

Documentation Best Practices
- README structure for technical projects
- Importance of reproducibility statements
- Clear communication for non-technical stakeholders
Version Control
- Proper .gitignore for ML projects
- Organizing code, data, and documentation
- Preparing for open-source collaboration

Key Insight

The biggest lesson: Simplicity + optimization beats complexity. Rather than building an overly complex architecture, focusing on:

Clean data preprocessing
Proven CNN patterns
Smart regularization
Efficient training

...delivered exceptional results in minimal time.

What's next for AI4Alzheimers

Immediate Next Steps (Science Fair Ready)

Interpretability with Grad-CAM
- Visualize which brain regions the model focuses on
- Validate that model learns clinically relevant features
- Create heatmap overlays for presentations
Cross-Validation
- Implement k-fold cross-validation (k=5)
- Report mean ± std accuracy for robustness
- Ensure results generalize beyond single train/test split
Interactive Demo
- Build Streamlit web app for live predictions
- Allow upload of new MRI images
- Display confidence scores and explanations

Research Extensions

External Validation
- Test on ADNI dataset (Alzheimer's Disease Neuroimaging Initiative)
- Evaluate cross-dataset generalization
- Identify domain shift challenges
Multi-Modal Learning
- Incorporate clinical data (age, APOE genotype, cognitive scores)
- Fusion architectures combining imaging + tabular data
- Expected accuracy boost: 1-2%
Longitudinal Prediction
- Predict disease progression over time
- Time-series analysis of sequential scans
- Risk stratification for clinical trials

Clinical Translation

Regulatory Pathway
- FDA 510(k) submission preparation
- Clinical validation studies
- Integration with PACS systems
Federated Learning
- Privacy-preserving distributed training
- Collaborate across hospitals without sharing data
- Improve model diversity and robustness

Impact & Deployment

Mobile Deployment
- Model quantization for edge devices
- TensorFlow Lite conversion
- Telemedicine integration
Global Health Initiative
- Partner with NGOs in underserved regions
- Low-cost screening programs
- Training programs for local healthcare workers

Dissemination

Publications
- ISEF (International Science & Engineering Fair) submission
- Preprint on medRxiv
- Potential journal publication (e.g., Nature Medicine)
Open Science
- Release pretrained models on Hugging Face
- Contribute to Alzheimer's research community
- Educational tutorials for students

Long-term Vision

Mission: Make early Alzheimer's detection accessible to every person on Earth, regardless of geographic or economic barriers.

2026 Goals:

Validate on 10,000+ diverse patients
Achieve FDA breakthrough device designation
Deploy in 10+ pilot clinics
Publish peer-reviewed research

2030 Vision:

Global screening program in 50+ countries
Integration with standard healthcare workflows
Real-time decision support for clinicians
Contribute to cure research through early detection data

Built With

jupyter-notebook
jupyter-notebook**
matplotlib
matplotlib**
numpy
numpy**
pandas
pandas**
pillow
pyarrow
pyarrow**
python
scikit-learn
scikit-learn**
seaborn
seaborn**
tensorflow/keras
tensorflow/keras**

Updates

TASNIM CHAOUCH started this project — Dec 31, 2025 05:50 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.