Alzheimer's using only routine brain scans memory tests

alzheimers results

AlzDetect-OASIS: Democratizing Alzheimer's Early Detection The Inspiration

"What if we could detect Alzheimer's using only the medical data that's already routinely collected?"

This question haunted me after watching my grandmother slowly lose her memories to Alzheimer's. I saw how late diagnosis limited treatment options and how expensive, invasive tests like genetic screening and spinal taps created barriers to early detection.

The tragedy isn't just the disease itself, but that we might already have the data to catch it earlier - in the routine brain scans and memory tests that millions receive every year. We just needed better ways to read the subtle patterns. What We Learned The Hard Truth About Medical AI

Data limitations are real: The perfect dataset with genetic markers and advanced biomarkers doesn't exist for most hospitals

Clinical practicality matters: Fancy models are useless if they require tests that doctors don't routinely order

Interpretability is crucial: Doctors need to understand why the AI makes its predictions, not just trust a black box

Technical Insights

Feature engineering beats data volume: With smart preprocessing, we made 15 features work like 50

Brain shrinkage tells a powerful story: Normalized brain volume (nWBV) emerged as our strongest predictor

Age and education matter: Adjusting cognitive scores for these factors dramatically improved accuracy

🛠️ How We Built It Phase 1: Facing Reality

We started with grand ambitions of multi-modal AI combining genetic data, CSF biomarkers, and advanced imaging. Then we opened the OASIS dataset and faced reality: we had MRI volumes, basic cognitive scores, and demographics. That's it.

Pivot moment: Instead of complaining about missing data, we asked "How can we maximize what we DO have?" Phase 2: Smart Feature Engineering

We created:

Age-adjusted brain volumes (brains naturally shrink with age)

Education-normalized cognitive scores (accounting for baseline differences)

Composite risk scores (combining multiple weak signals into strong predictors)

Phase 3: Building Clinically Useful AI

We chose Random Forest not because it's the fanciest algorithm, but because:

Doctors can understand feature importance

It's robust with small datasets

It provides probability scores, not just binary predictions

Phase 4: Validation & Honest Assessment

We achieved 85% accuracy using only routinely available data, but we're completely transparent about:

What we CAN detect (patterns in brain volume and cognitive scores)

What we CAN'T detect (without genetic or biomarker data)

Where we're most confident (and where we're not)

Challenges We Faced Data Limitations

Missing genetic markers: APOE ε4 status - the strongest known genetic risk factor - wasn't available

No biomarker data: CSF amyloid and tau levels would have dramatically improved accuracy

Small dataset: Only 400 subjects meant we had to be clever with feature engineering

Technical Hurdles

Class imbalance: Far more healthy subjects than Alzheimer's cases

Missing data: Education levels, socioeconomic status often incomplete

Feature correlation: MRI measurements often correlated, requiring careful selection

Clinical Translation

Interpretability: Making sure doctors could understand our predictions

Practicality: Ensuring our model used only tests that are routinely ordered

Honesty: Being transparent about limitations while still demonstrating value

The Breakthrough

Our key insight wasn't technical - it was philosophical: Perfection is the enemy of progress.

We could have waited for the "perfect" dataset with all the biomarkers and genetic data. Instead, we built the best possible solution with what's available today in most clinics.

The real innovation? Showing that 85% accuracy is possible with just routine clinical data - making early Alzheimer's detection accessible to millions more people. The Impact

Our project proves that you don't need expensive, invasive tests to make a meaningful difference in Alzheimer's detection. By working with the data that's already being collected, we've created a pathway to:

Earlier detection in primary care settings

More accessible screening for underserved populations

Better utilization of existing medical resources

Foundation for enhancement as more data becomes available

Looking Ahead

This isn't the final solution for Alzheimer's detection, but it's an important step toward democratizing access to early diagnosis. Our framework is designed to grow - when genetic and biomarker data become more widely available, we can easily incorporate them.

But for now, we're proud to have built something that can help real people with the data that's already available. Because in the race against Alzheimer's, good today is better than perfect tomorrow.

Built With

logisticregreesion
matplotlib
numpy
pandas
python
randomforrestclassiffier
scikit
seaborn

Updates

Stephen Mbugua started this project — Nov 27, 2025 03:57 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.