What it does

You input the patients year of birth, normalized brain volume, and whether or not they have more than 4 years of post secondary education and it outputs whether Alzheimer's was detected and the confidence level. Normalized brain volume is a measure of brain size that can be extracted from MRI scans with the right technology. This is exceedingly complicated so I decided to focus on processing the extracted data.

How I built it

First, I went through a lot of datasets looking for something that was real and useful regarding predicting Alzheimer's. When I found it I had to clean it up and combine the data from a longitudinal study and cross-sectional study to increase the reliability of my model. I tested using different combinations of variables until I landed on Age + normalized brain volume + whether they were highly educated. I used cross-validation to tune the k value and tweaked the parameters of the model to increase recall.

Challenges I ran into

I needed to render a UI that could load my model created in R. I had never done this before but I managed to learn how to use a library called Shiny to create a simple UI that would give the data it obtained to a loaded model stored in a .rds file.

Accomplishments that I am proud of

The model has 81% recall and 76% accuracy without using any kind of cognitive test information and relying only on simple demographic information and a metric to measure brain shrinkage. I thought that it would defeat the point of the project to use mini metal health state examination (MMSE) or similar test results since if a patient has already been tested its likely they already have the necessary information about their cognitive decline.

What I learned

Using real world data can be more challenging because it often includes a lot of missing values and inconsistent metrics. For example, one of the studies measured education level while the other measured education years instead but they were both labeled "EDUC".

What's next for Alzheimer's Early Detection

This basic model is pretty effective considering its limitations. If there were datasets that could train a model using other patient information (like genetic data or more MRI data) that might be useful in increasing the recall to something more like 95%+

Built With

  • dplyr
  • r
  • shiny
  • tidyverse
Share this project:

Updates