Inspiration
While researching AI applications in healthcare, we discovered a critical barrier: most AI diagnostic systems require expensive GPU infrastructure, making them inaccessible to rural clinics, developing countries, and resource-constrained healthcare facilities worldwide.
We asked ourselves: What if we could build an effective Alzheimer's screening system that runs on ANY computer?
This challenge inspired us to rethink AI accessibility. Rather than chasing marginal accuracy gains with expensive hardware, we focused on creating a CPU-optimized solution that prioritizes:
- Deployment feasibility over benchmark scores
- Clinical safety through uncertainty quantification
- Global accessibility regardless of computational resources
The goal: Democratize AI-powered Alzheimer's screening for clinics worldwide.
What it does
Our system is a 3-model ensemble that classifies Alzheimer's disease severity (CDR 0, 0.5, 1, 2) from brain MRI scans, with built-in safety mechanisms for clinical deployment.
Core Features:
- 68.3% classification accuracy without any GPU usage
- Uncertainty quantification via ensemble prediction variance
- Automatic risk stratification: Flags 222 high-risk cases (9.9%) for manual review
- Fast training: Complete ensemble in <2 hours on standard CPU
- Real-time inference: <1 second per patient scan
Clinical Safety Mechanism:
Our uncertainty-based risk stratification provides three action levels:
Risk Level = std(Model1, Model2, Model3)
- LOW (< 0.10): Proceed with diagnosis confidently
- MEDIUM (0.10-0.15): Recommend additional tests
- HIGH (> 0.15): Require radiologist review
Validation results show clear separation:
- Mean uncertainty (correct predictions): 0.0842
- Mean uncertainty (incorrect predictions): 0.1066
- 26% higher uncertainty for errors - enabling effective risk detection
Why This Matters:
Unlike traditional AI systems that only provide predictions, our approach adds confidence scoring and safety flagging, making it suitable for real clinical deployment where overconfident errors can have serious consequences.
How we built it
Architecture & Models
Ensemble Composition:
- ResNet50 (Variant 1) - 68.91% validation accuracy
- EfficientNet-B0 - 66.73% validation accuracy
- ResNet50 (Variant 2) - 70.15% validation accuracy
- Final Ensemble - 68.30% test accuracy
Each model uses transfer learning with ImageNet pre-trained weights, custom classification heads, and different random initializations to ensure diversity.
CPU Optimization Strategy
We implemented multiple optimizations to make CPU-only training feasible:
| Component | Standard | Optimized | Speedup |
|---|---|---|---|
| Image Size | 224×224 | 128×128 | 4× faster |
| Layer Freezing | 70% | 80% | 1.5× faster |
| Batch Size | 32 | 16 | CPU-friendly |
| Epochs | 20 | 10 | 2× faster |
| Dataset | 86k images | 15k sampled | 3× faster |
Combined result: ~8× faster training without significant accuracy loss!
Technical Implementation
# Ensemble prediction with uncertainty
def predict_with_uncertainty(models, image):
predictions = [model.predict(image) for model in models]
mean_pred = np.mean(predictions, axis=0)
uncertainty = np.std(predictions, axis=0)
# Risk stratification
risk = "LOW" if uncertainty < 0.10 else \
"MEDIUM" if uncertainty < 0.15 else "HIGH"
return mean_pred, uncertainty, risk
Technology Stack
- Framework: TensorFlow 2.19.0 / Keras
- Models: ResNet50, EfficientNet-B0 (transfer learning)
- Data Science: NumPy, Pandas, Scikit-learn
- Visualization: Matplotlib, Seaborn
- Dataset: OASIS (15k strategically sampled from 86k total)
- Environment: Google Colab (CPU runtime only)
Training Process
- Data sampling - Stratified selection maintaining class proportions
- Preprocessing - Resize to 128×128, normalization, augmentation
- Transfer learning - Load ImageNet weights, freeze 80% of layers
- Fine-tuning - Train custom heads on Alzheimer's data
- Ensemble creation - Combine 3 diverse models
- Uncertainty calibration - Establish risk thresholds from validation data
Challenges we ran into
Challenge 1: GPU Limitations
Problem: Hit Google Colab GPU usage quota early in development
Initial reaction: Frustration - how can we train deep learning models without GPU?
Solution: Pivoted to CPU-only approach and turned the constraint into our core innovation
Outcome: Created something more valuable - an accessible system anyone can use
Challenge 2: CPU Training Time
Problem: Initial CPU training took 6-8 hours per model (18-24 hours total)
Unacceptable for: Rapid iteration and experimentation
Solution implemented:
- Reduced image size: 224×224 → 128×128 (4× speedup)
- Increased layer freezing: 70% → 80% (1.5× speedup)
- Reduced epochs with early stopping: 20 → 10 (2× speedup)
- Strategic dataset sampling: 86k → 15k images (3× speedup)
Final result: 20-30 minutes per model, ~1.5 hours total!
Challenge 3: Class Imbalance
Problem: Severe imbalance in dataset
- CDR 0 (Non-demented): 67,222 images
- CDR 2 (Moderate): Only 488 images
- 138:1 ratio!
Solution:
- Stratified train/val/test splits to maintain proportions
- Proportional sampling (10k/3k/1.5k/488)
- Weighted metrics in evaluation
- Ensemble helps with minority class detection
Challenge 4: Accuracy vs Accessibility Trade-off
Dilemma: Could achieve 85-90% accuracy with GPU optimization, but:
- Requires expensive hardware ($5,000-$10,000+)
- Limits deployment to well-funded institutions
- Excludes 90% of global clinics
Decision: Prioritize accessibility over marginal accuracy gains
Rationale: 68% accuracy accessible to everyone > 90% accuracy for privileged few
Philosophy: Real-world impact > benchmark leaderboards
Accomplishments that we're proud of
Achieved 68.3% accuracy WITHOUT any GPU usage
Proved that effective AI doesn't require expensive infrastructure. This opens doors for global deployment.
Successfully implemented uncertainty quantification
Not just predictions - our system knows when it's uncertain. 222 high-risk cases automatically flagged for expert review.
Complete training in <2 hours
From 18-24 hours to 1.5 hours enables rapid experimentation and iteration. Makes AI accessible to students and researchers too.
Production-ready system with clinical safety
Not just a prototype - includes risk stratification, confidence scoring, and safety mechanisms suitable for real clinical deployment.
Accessibility-first approach works
Demonstrated that smart optimization can overcome hardware limitations. Can run in rural clinics, developing countries, mobile units - anywhere.
Clear uncertainty separation
Mean uncertainty differs by 26% between correct and incorrect predictions, enabling effective risk detection.
Complete end-to-end pipeline
Data loading → preprocessing → training → evaluation → uncertainty quantification → visualization → deployment-ready.
What we learned
Technical Insights
Transfer learning is incredibly powerful
Pre-trained ImageNet weights gave us a massive head start. Fine-tuning on medical images worked remarkably well.
Ensemble diversity is key for uncertainty
Using different architectures (ResNet vs EfficientNet) and initializations provides better uncertainty estimates than identical models.
Small optimizations compound dramatically
Multiple 2-3× speedups multiplied to achieve 8× overall improvement. Every optimization matters.
CPU can compete with smart design
Hardware limitations can be overcome with architectural choices, optimization strategies, and creative problem-solving.
Data Science Lessons
Class imbalance requires careful handling
Stratified sampling and proportional representation are crucial for medical datasets with rare conditions.
Uncertainty quantification is valuable
Standard accuracy metrics don't tell the full story. Knowing WHEN your model is uncertain is often more important than raw accuracy.
Validation strategy matters
Proper train/val/test splits with stratification prevented overfitting and gave honest performance estimates.
Broader Realizations
Accessibility = Impact
The best AI system is one people can actually use. A 68% accurate system deployed in 1,000 clinics helps more patients than a 95% system in 10 elite hospitals.
Safety over accuracy
For clinical deployment, confidence scoring and human-in-the-loop mechanisms matter more than benchmark leaderboards.
Constraints drive innovation
Limited GPU access forced us to innovate. Sometimes the best solutions come from working around limitations.
Real-world deployment complexity
Building a model that works in a notebook is 20% of the challenge. Making it fast, accessible, safe, and deployable is the other 80%.
Specific Numbers That Surprised Us
- Image size reduction gave 4× speedup with only ~3% accuracy loss
- Ensemble uncertainty was 26% higher for incorrect vs correct predictions
- 80% layer freezing (vs standard 70%) still maintained accuracy
- Strategic sampling used only 17% of data (15k/86k) with minimal impact
- Total optimization: 8× faster training compared to baseline approach
What's next for Error808
Immediate Next Steps (1-3 months)
1. External Validation
- Test on ADNI dataset (Alzheimer's Disease Neuroimaging Initiative)
- Validate on UK Biobank brain imaging data
- Multi-site clinical validation across diverse populations
- Compare performance with radiologist benchmark
2. Web Application Development
- Flask/FastAPI backend for model serving
- React frontend for clinician interface
- Upload MRI scan → Get prediction + uncertainty + risk level
- HIPAA-compliant data handling
3. Clinical Integration
- DICOM format support for medical imaging standards
- PACS (Picture Archiving and Communication System) integration
- HL7/FHIR compatibility for hospital systems
- Electronic Health Record (EHR) connectivity
4. Documentation & Training
- Clinician user guide and training materials
- API documentation for developers
- Deployment guide for IT administrators
- Clinical validation study results publication
Medium-term Enhancements (3-6 months)
Multi-Modal Fusion Combine brain MRI with:
- Clinical data: Age, education, medical history, medications
- Cognitive tests: MMSE scores, memory assessments
- Genetic markers: APOE ε4 status, family history
- Blood biomarkers: Tau protein, amyloid-beta levels
- Lifestyle factors: Diet, exercise, social engagement
Expected improvement: 75-80% accuracy with multi-modal approach
Explainability Features
- Grad-CAM attention maps: Highlight brain regions influencing prediction
- SHAP values: Quantify feature importance
- ROI (Region of Interest) highlighting: Circle areas of concern
- Natural language explanations: "High confidence based on hippocampal atrophy"
Longitudinal Tracking
- Disease progression modeling over time
- Treatment response monitoring
- Early detection of rapid decline
- Personalized risk trajectories
Long-term Vision (6-12 months)
Advanced Clinical Features
- Subtype classification: Distinguish AD variants (typical vs atypical)
- Differential diagnosis: Rule out other dementias (vascular, Lewy body)
- Severity staging: Predict functional decline and care needs
- Treatment planning: Recommend interventions based on risk profile
Deployment Options
- Mobile application: Edge computing for remote screening
- Cloud API: RESTful service with auto-scaling
- Embedded devices: Model quantization for edge deployment
- Offline mode: No internet required for rural areas
Optimization Research
- Model compression: Pruning, quantization, knowledge distillation
- Faster inference: TensorRT, ONNX optimization
- Smaller models: MobileNet, SqueezeNet architectures
- Browser deployment: TensorFlow.js for web-based inference
Global Accessibility Initiative
- Multi-language support: Interface in 10+ languages
- Low-bandwidth mode: Compressed data transfer for rural connectivity
- Offline training: Enable local model fine-tuning
- Open-source release: Make system freely available
Research Directions
Uncertainty Quantification Methods
- Bayesian neural networks
- Monte Carlo dropout
- Deep ensembles with different architectures
- Evidential deep learning
Few-shot Learning
- Adapt to new populations with minimal data
- Transfer to related neurological conditions
- Continual learning without catastrophic forgetting
Fairness & Bias Analysis
- Performance across demographic groups
- Bias detection and mitigation strategies
- Equitable healthcare AI practices
Success Metrics 📊
Technical Goals:
- External validation accuracy: >70%
- Inference time: <500ms per scan
- Model size: <100MB total
- API response time: <2 seconds
Deployment Goals:
- Deploy in 10 pilot clinics (diverse settings)
- Process 1,000+ patient scans
- Achieve 95% clinician satisfaction score
- Demonstrate cost-effectiveness vs standard screening
Impact Goals:
- Make screening accessible to 100+ under-resourced clinics
- Publish peer-reviewed validation study
- Open-source release with documentation
- Present at medical AI conferences
The ultimate vision: A world where advanced AI-powered Alzheimer's screening is available to every clinic, regardless of their computational resources, helping detect and manage this devastating disease earlier and more effectively.
Built With
- alzheimers
- computer-vision
- deep-learning
- efficientnet
- ensemble-learning
- google-colab
- healthcare-ai
- keras
- machine-learning
- matplotlib
- medical-imaging
- numpy
- pandas
- python
- resnet50
- scikit-learn
- seaborn
- tensorflow
- transfer-learning
Log in or sign up for Devpost to join the conversation.