Neurotech - Team DSFIT

Inspiration

Mental health affects millions, yet diagnosing disorders remains subjective and complex. Growing up in a world where mental well-being is crucial, we aimed to make a difference. At the 2025 Rice Datathon, we leveraged EEG data and machine learning to improve psychiatric disorder detection, creating models that could lead to faster, more objective diagnoses. By tackling this challenge, we hope to contribute to a future where mental health care is more accessible, data-driven, and effective.

What it does

Given a set of EEG data from patients, we developed 2 different machine learning models to identify psychiatric disorders, aiming to enhance mental health diagnostics. For our main model, we built a classifier to make a single diagnosis out of the 6 main categories of mental disorders. We also developed a binary classification model which can decide between an given disorder category and a negative diagnosis with greater accuracy.

How we built it

For the multi-class classification task, we compared gradient-boosted trees, random forests, and Support Vector Machines. We ultimately chose gradient-boosted trees since they consistently obtained the highest accuracy and avoided biases toward overrepresented categories. For the binary classification task, we compared gradient boosting and logistic regression with elastic net penalty, finding that the logistic regression yielded better results.

Challenges we ran into

One of the issue we ran into was feature engineering. We tried summing PSD values across different frequency bands and electrodes, thinking it would improve our model’s accuracy. However, after a lot of testing, we realized it was actually hurting the accuracy so we scrapped it.

Another major challenge was figuring out that coherence values were actually hurting our model’s performance. At first, we assumed they would be useful, but once we dug deeper, we saw that they were negatively impacting accuracy. Removing them made a huge difference and was a key turning point in improving our results. This may be in part due to the relatively high feature count relative to the sample size.

We also struggled with finding models that could perform well despite having a high number of parameters and a relatively small amount of training data. Balancing complexity without overfitting was tricky, and it took some trial and error to get it right.

On top of that, we had to address imbalances in the dataset—some psychiatric disorders were significantly underrepresented. Without adjustments, our models leaned too heavily toward the more common disorders. To fix this, we weighted the model using the amount they were represented to make sure all disorders had a fair chance of being recognized, which helped improve overall accuracy.

Accomplishments that we're proud of

Developing 4 working models and achieving relatively similar accuracy scores compared to the research paper that the data was used in regarding binary classification. https://www.frontiersin.org/journals/psychiatry/articles/10.3389/fpsyt.2021.707581/full

What we learned

We realized that more features is not always better, and may introduce noise. We also learned about many of the advantages/disadvantages of different ML models from all the trial and error we went through trying to pick the best ones for this dataset.

What's next for Neurotech - Team DSFIT

Next, we plan to explore ways to incorporate coherence data without compromising model accuracy. We’ll also focus on improving binary classification, as it has greater real-world usability than multi-class approaches.