Classification of Alzheimer’s Disease from Speech Data

Sierra Rowley posted an update — Nov 23, 2020 09:51 PM EST

Introduction: Alzeihmer’s disease (AD) is a leading cause of death worldwide and the growing elderly population will only exacerbate this issue. Due to the large number of AD victims, many people, including the members of this group, can understand the devastating toll that AD can have on a family. The paper An Automatic Assessment System for Alzheimer’s Disease Based on Speech Using Feature Sequence Generator and Recurrent Neural Network (Yi-Wei Chien, Sheng-Yi Hong, Wen-Ting Cheah, Li-Hung Yao, Yu-Ling Chang, and Li-Chen Fu) focuses on creating an automatic assessment system that can classify a patient as either having or not having Alzeihmer’s disease through the use of a neural network model. Catching a patient in the early stages of AD can be key to a successful treatment. However, it has been difficult to make these quick diagnoses based off of a human's analysis. Machine Learning has been seen as a potential tool to help make improvements on this issue. Due to the fact that many of the early signs of AD are prevalent in a patient's speech patterns, such as confusion or a smaller vocabulary, language datasets provide an ideal opportunity to apply machine learning. Collecting these datasets is highly feasible and analysis can be done with techniques such as support vector machines, random forests, and neural networks.

This paper specifically talks about their data collection process, a feature sequence generator model they implemented on their dataset, and an RNN model they used for classification. During data collection, users were tested on three categories: fluency, picture description, and logical memory. All three datasets were combined and tokenized by syllable including a silence token. The feature sequence generator took in an audio dataset and created tokenized sequences using a Convolutional Recurrent Neural Network (CRNN). Lastly, the Alzeihmer’s disease engine they made uses the feature sequence generated as input to an RNN. A probability between 0 and 1 is output of the model and determines if a patient has AD (1) or does not have AD (0). Three RNN cells were tried: GRU, LSTM, and the simple cell, to see if results differed. Both the GRU and LSTM ended up performing well with AUROC scores of over 0.9. When we build the model, we plan on using a different dataset than the one collected in the paper. Specifically, we will be using TalkBank’s Pitt Dementia

Challenges: What has been the hardest part of the project you’ve encountered so far? Preprocessing the data has been difficult. Our data came as one file for each interview and contained the patient’s lines, the interviewer’s lines, and other information about the grammar and parts of speech from the patient’s lines. First we had to pull out only the words from the patient’s lines from each file since that is the data we will be feeding our model. All of the patients' lines ended in a time stamp and it took a long time to figure out how to strip all of these from the data. Also, the data already had some preprocessing done to it, such as shortened terms were rewritten as the full word with the ending in parentheses (ex. “havin” was turned into “hav(ing)”) and certain sounds were given their own symbol representation (ex. “(.)” represents a pause and “(..)” represents a longer pause). Since we did not want each of these words to be treated differently in our vocab dictionary, we had to look through the data and strip any punctuation that was previously used to mark preprocessing. Overall, it took some time to get the data into the format we wanted. Lastly, we UNKed, padded, and tokenized the data, which we used the keras text and sequence preprocessing functions to do.

Insights: Are there any concrete results you can show at this point? There are no concrete results yet because we are figuring out how to train the embeddings in the most optimal way. We spent a lot of time preprocessing the data so that the embeddings we create in our model will be the most useful. As a result we have not finished the model.

Plan: Are you on track with your project? Currently we are a little behind on the project. At this point we hoped to have more of the model written, but we underestimated how long preprocessing the data would take. We need to spend more time on writing our model and then tweaking things in preprocessing. We may change the characters we strip from our data in preprocessing. The data set we are using has a lot of special characters that represent certain actions like using slang, pausing, crying, drifting off, etc. Currently we are leaving a lot of these characters in the data, however in the future we might try to remove them. This will make it so that the word “about” and “(a)bout”, which represents someone saying ‘bout, will not be represented as two different embeddings.

Log in or sign up for Devpost to join the conversation.