Multimodal AI for Early Detection of Alzheimer’s Disease

confusion matrix from the base model
feature importance
confusion matrix from the used model

Inspiration

Alzheimer’s Disease is often described as the long goodbye, but the tragedy isn't just the disease itself, it's the timing of the diagnosis. By the time a patient is clinically diagnosed with Dementia, the brain damage is often irreversible. We were inspired by the concept of the "Gray Zone" Mild Cognitive Impairment (MCI). This is the critical window where intervention matters most, yet it is notoriously difficult to distinguish from normal aging. Current diagnostic models often act like fire alarms (alerting you when the house is already burning). We wanted to build a digital smoke detector—an AI capable of spotting the subtle, invisible mismatches between brain structure and function before the clinical symptoms become obvious.

What it does

Our project is a Multimodal AI Screening Tool that predicts whether a patient is Healthy, has Mild Cognitive Impairment (MCI), or has Dementia. Unlike standard models that rely on a single data source, our system acts as a "Hybrid Intelligence":

For every patient: It analyzes non-invasive data (MRI brain volumes, demographics, and cognitive test scores) to form a baseline prediction.

For complex cases: It automatically incorporates "Gold Standard" biomarkers (Amyloid & FDG PET scans) if they are available, even if those scans were taken on different dates.

Context-Aware Analysis: It doesn't just look at test scores; it calculates "Cognitive Reserve"—checking if a patient's brain function is lower than it should be given their education level and brain volume.

How we built it

The "Hub-and-Spoke" Data Pipeline: Real-world medical data is messy. Patients rarely get MRI and PET scans on the same day. Standard merging resulted in only 8 usable rows out of thousands. We built a custom Nearest-Date Algorithm using Python/Pandas that treated the MRI visit as the "Hub" and pulled in the temporally closest Amyloid and FDG PET scans. This allowed us to retain 4,168 patients without losing a single clinical record.

Domain-Driven Feature Engineering: We engineered features to mimic a neurologist’s intuition. We created interaction variables like the Structure-Function Gap (ratio of Hippocampal volume to MMSE score) and Education Normalization, which helps detect decline in high-functioning individuals who might otherwise "cheat" the test.

Risk-Averse Modeling: We used XGBoost for its native ability to handle sparse PET scan data (using NaN logic). Instead of optimizing for standard Accuracy, we tuned the model for Recall (Sensitivity). We implemented Class Weights to penalize missing MCI cases and a custom Probability Threshold logic: if the model wasn't at least 60% sure a patient was healthy, it flagged them for screening.

Challenges we ran into

The Temporal Mismatch: Our biggest hurdle was that the Amyloid, FDG, and MRI datasets had different timestamps. Overcoming the "0 matches found" error required writing a custom temporal merging logic rather than relying on standard library functions.

The "Invisible" MCI: Distinguishing MCI from Normal aging is mathematically difficult. Our initial baseline models kept classifying MCI patients as Healthy because they looked similar on paper. It wasn't until we engineered the "Cognitive Reserve" features that the model started catching these subtle cases.

The Imbalance: Our dataset had far more healthy patients than sick ones. The model initially tried to be "lazy" by guessing "Normal" for everyone. We had to force it to pay attention to the minority classes using weighted loss functions.

Accomplishments that we're proud of

Solving the MCI Puzzle: We increased the detection rate (Recall) of Mild Cognitive Impairment from a baseline of 42% to 64%. In a medical screening context, catching these extra cases is a massive win.

Validating "Cognitive Reserve": When we analyzed Feature Importance, our engineered features (like MMSE_per_Educ) ranked higher than many raw biomarkers. This proves that teaching the AI "medical context" is more powerful than just feeding it raw numbers.

Data Integrity: We successfully merged three disparate datasets (Clinical, MRI, PET) into a single master dataset of over 4,000 patients, creating a valuable resource for future research.

What we learned

Context is King: A normal memory score isn't actually normal if the patient has a Ph.D.; it represents a decline. Incorporating education levels into the scoring was the key to unlocking model performance.

The Accuracy Trap: In data science, we are often taught to maximize Accuracy. In this project, we learned that for medicine, Sensitivity is far more important. A model that misses a sick patient is a failure, even if it has 90% accuracy.

Handling Sparse Data: We learned that missing values aren't always errors to be fixed. Sometimes, the absence of a test is a signal in itself. Leveraging XGBoost's native NaN handling was superior to imputing artificial data.

What's next for Multimodal AI for Early Detection of Alzheimer’s Disease

Deployment: We plan to wrap the model in a Web App where a clinician can input a patient's MRI stats and demographics to get an instant risk assessment.

Explainability: Integrating SHAP (Shapley Additive Explanations) values to give doctors a "Reason Code" for every prediction (e.g., "Flagged because Hippocampal volume is low relative to Age").

Longitudinal Analysis: Currently, we look at a single snapshot in time. The next iteration will analyze the rate of change between visits to detect rapid decliners even earlier.

Built With

googlecolab
jupyternotebook
matplotlib
numpy
pandas
python
scikitlearn
seaborn

Updates

Aishat Abubakar started this project — Dec 20, 2025 10:56 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.