TDA + ML and Its Effects on Correct Classification of ASD

Inspiration

A year ago I tried doing a research project with one of my professors on Topological Data Analysis and failed miserably. A few weeks ago I saw that Georgia Tech was hosting a Hackathon, and with this being my first Hackathon, as well as it being data analysis themed, I decided to do something that I was comfortable-ish with, but that was still special and challenging to do.

The dataset was chosen due to there being a lot of research on the psychological aspects of Autism, however there was barely any on the physical aspects, such as the ways an autistic person's brain differs from that of a non-autistic person's, as well as if there are any significant markers that can be used to diagnose autism.

What it does

It is supposed to be able to correctly classify a hypothetical patient as either being or not being on the autistic spectrum, as well as creating a visual representation of the fMRI data in form of a persistent homology map.

How we built it

The entire project was built in Jupyter Notebook and then later transferred to a TypeScript based React frontend, in order to be displayed in the style of a research paper.

The raw data, in the form of a correlation matrix of different brain regions, was taken and is being preprocessed by topologically analyzing it and constructing different topological features, such as Persistence Entropy as well as the Wasserstein Distance. Following this, two separate ML models (XGBoost) were then trained on the provided data, with one being trained on just the Raw Data, and the other on the raw data + the transformed TDA data. These models were then made to do tests and optimized based on how they performed, while still getting comparable parameters. After they were done being optimized, their efficacy was then compared based on their accuracy rate. A persistent homology mapping was also constructed from the raw data, in order to be able to see whether or not any clustering based on similarity between autistic and non-autistic data is possible.

Challenges we ran into

Not having much of a background in data science other than the very basic introductory skills and no practical experience at all in anything related to ML, it was quite the challenge trying to figure out how to do everything in an efficient way within 36 hours.

Accomplishments that we're proud of

I am honestly proud of my achievement of going from knowing absolutely nothing about anything I was doing to actually kind of having an idea. While the accuracy rates of both models are nothing to be proud of, I believe that with a larger dataset, as well as more time spent optimizing either model it would even be able to be applied in supplementing the formal diagnosis of Autism.

What we learned

Absolutely everything. I don't think I spent a minute of these 36 hours not actively learning something new, except for the frontend.

What's next for TDA + ML and its effects on correct classification of ASD

I would love to try out other types of models, such as a Deep Learning or CNN model. Due to me having awakened a kind of passion for this topic within the past 36 hours, I believe that I will pursue further optimization of the models, as well as studying more TDA in order to find further enhancements and applications of the idea of combining TDA and ML algorithms.

Built With

Updates

Jan Kaltenegger started this project — Feb 22, 2025 08:32 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.