AlzDetect-OASIS: Democratizing Alzheimer's Early Detection The Inspiration
"What if we could detect Alzheimer's using only the medical data that's already routinely collected?"
This question haunted me after watching my grandmother slowly lose her memories to Alzheimer's. I saw how late diagnosis limited treatment options and how expensive, invasive tests like genetic screening and spinal taps created barriers to early detection.
The tragedy isn't just the disease itself, but that we might already have the data to catch it earlier - in the routine brain scans and memory tests that millions receive every year. We just needed better ways to read the subtle patterns. What We Learned The Hard Truth About Medical AI
Data limitations are real: The perfect dataset with genetic markers and advanced biomarkers doesn't exist for most hospitals
Clinical practicality matters: Fancy models are useless if they require tests that doctors don't routinely order
Interpretability is crucial: Doctors need to understand why the AI makes its predictions, not just trust a black box
Technical Insights
Feature engineering beats data volume: With smart preprocessing, we made 15 features work like 50
Brain shrinkage tells a powerful story: Normalized brain volume (nWBV) emerged as our strongest predictor
Age and education matter: Adjusting cognitive scores for these factors dramatically improved accuracy
🛠️ How We Built It Phase 1: Facing Reality
We started with grand ambitions of multi-modal AI combining genetic data, CSF biomarkers, and advanced imaging. Then we opened the OASIS dataset and faced reality: we had MRI volumes, basic cognitive scores, and demographics. That's it.
Pivot moment: Instead of complaining about missing data, we asked "How can we maximize what we DO have?" Phase 2: Smart Feature Engineering
We created:
Age-adjusted brain volumes (brains naturally shrink with age)
Education-normalized cognitive scores (accounting for baseline differences)
Composite risk scores (combining multiple weak signals into strong predictors)
Phase 3: Building Clinically Useful AI
We chose Random Forest not because it's the fanciest algorithm, but because:
Doctors can understand feature importance
It's robust with small datasets
It provides probability scores, not just binary predictions
Phase 4: Validation & Honest Assessment
We achieved 85% accuracy using only routinely available data, but we're completely transparent about:
What we CAN detect (patterns in brain volume and cognitive scores)
What we CAN'T detect (without genetic or biomarker data)
Where we're most confident (and where we're not)
Challenges We Faced Data Limitations
Missing genetic markers: APOE ε4 status - the strongest known genetic risk factor - wasn't available
No biomarker data: CSF amyloid and tau levels would have dramatically improved accuracy
Small dataset: Only 400 subjects meant we had to be clever with feature engineering
Technical Hurdles
Class imbalance: Far more healthy subjects than Alzheimer's cases
Missing data: Education levels, socioeconomic status often incomplete
Feature correlation: MRI measurements often correlated, requiring careful selection
Clinical Translation
Interpretability: Making sure doctors could understand our predictions
Practicality: Ensuring our model used only tests that are routinely ordered
Honesty: Being transparent about limitations while still demonstrating value
The Breakthrough
Our key insight wasn't technical - it was philosophical: Perfection is the enemy of progress.
We could have waited for the "perfect" dataset with all the biomarkers and genetic data. Instead, we built the best possible solution with what's available today in most clinics.
The real innovation? Showing that 85% accuracy is possible with just routine clinical data - making early Alzheimer's detection accessible to millions more people. The Impact
Our project proves that you don't need expensive, invasive tests to make a meaningful difference in Alzheimer's detection. By working with the data that's already being collected, we've created a pathway to:
Earlier detection in primary care settings
More accessible screening for underserved populations
Better utilization of existing medical resources
Foundation for enhancement as more data becomes available
Looking Ahead
This isn't the final solution for Alzheimer's detection, but it's an important step toward democratizing access to early diagnosis. Our framework is designed to grow - when genetic and biomarker data become more widely available, we can easily incorporate them.
But for now, we're proud to have built something that can help real people with the data that's already available. Because in the race against Alzheimer's, good today is better than perfect tomorrow.
Log in or sign up for Devpost to join the conversation.