SpeechReclaim

Code Snippets
Frontend

Inspiration

Finding inspiration for this project was initially quite a hurdle as none of us had any experience in the field of neuroscience, but a simple thought sparked the entire course of this project. A close friend of one of the team members suffers from stuttering so we decided to research on that topic. That research led us to stumble upon the realm of neurological voice disorders.

What it is

SpeechReclaim is a software that takes in user input for speech disorders and computes whether the user suffers from that particular neurological speech disorder. The neurological disorders that this software currently focuses on are Parkinson's speech disorder, Stuttering, and Aphasia. Based on the prediction about the disease, a vital feature that we aim at implementing is an AI chatbot that gives a long-term training plan suggestion to the user which can potentially help them combat or overcome their disorder.

How it works

In this project, we developed a web-based diagnostic tool to assist in identifying Parkinson's disease using acoustic analysis and machine learning. The tool leverages features derived from vocal recordings, which are known to correlate with Parkinson's disease, such as fundamental frequency variations (e.g., MDVP frequencies), jitter, shimmer, and noise-to-harmonics ratio (NHR). These features capture subtle irregularities in speech that are indicative of neurological impairments.

Dataset Preparation and Feature Engineering We utilized a publicly available dataset containing 20 acoustic features extracted from vocal samples of individuals with and without Parkinson's disease. Features included both frequency-based (e.g., MDVP metrics) and amplitude-based measures (e.g., shimmer metrics), along with noise-related and dynamic complexity features like HNR, RPDE, and DFA. The dataset was preprocessed by standardizing the numerical features to ensure uniform scaling, and missing values were included using mean substitution.

Model Training and Optimization To classify individuals as either healthy or affected by Parkinson's disease, we employed a Random Forest classifier due to its robustness in handling high-dimensional data and its ability to provide feature importance insights. Hyperparameter tuning was performed using GridSearchCV to optimize the number of estimators and tree depth, ensuring maximal predictive accuracy. The model achieved a classification accuracy of 96% on the test set, alongside high precision, recall, and F1 scores, demonstrating its efficacy in distinguishing between the two classes.

Real-Time Audio Analysis In addition to leveraging pre-existing data, we implemented a feature extraction pipeline to analyze user-uploaded audio files. This pipeline, built using Librosa, extracts the same 20 acoustic features from the vocal recording of the user. For instance, fundamental frequency features are derived using Librosa's pitch tracking capabilities, while jitter and shimmer metrics are calculated from temporal and amplitude variations in the waveform. Noise-related metrics such as NHR and HNR are extracted by separating the harmonic and percussive components of the signal.

Web Application Development The diagnostic tool is deployed as an interactive web application built with Reflex. Users can upload an audio recording, which is processed to extract relevant acoustic features. These features are then standardized using the pre-trained scaler, and the Random Forest model predicts the likelihood of Parkinson's disease. The application provides a clear visual representation of the prediction, alongside metrics such as accuracy and feature importance, enhancing user understanding.

Conclusion By combining robust machine learning techniques with real-time feature extraction, our application demonstrates the potential to serve as an accessible, non-invasive diagnostic aid for Parkinson's disease. Its accuracy and ease of use position it as a promising tool for both clinical and research settings.

Challenges faced

A few of the challenges that we ran into on the way are

The dataset that we used for training the model of the Parkinson's speech disorder was recorded using the Multi-Dimensional Voice Program, which is part of the Visi-pitch hardware-software package. We didn't have access to this particular package, so we instead had to revert to using a traditional Python library that analyzes the speech and splits it into its constituents, which offer a lesser variety of audio features than the MDVP.
Initial raw datasets were in text format, so we had to convert them to usable csv format.
A large portion of the dataset for a disorder was in the form of plain audio files, from which we had to extract voice data using a Python library and use as the initial dataset for training the AI model.

What we're proud of

We're especially proud of training a model that has a high accuracy in predicting whether the user suffers from a particular voice disorder and also of the fact that we managed to create an interactive front-end user interface that allows users to input audio recordings, learn information about neurological voice disorders, and customize the website to their liking, all within 60 hours!

Information gained

We gained quite a lot of information about neurological concepts and about how various speech disorders affect the daily lives of their victims and about the functioning of the brain in cases of these neurological disorders. From a technical standpoint, we learnt how audio data is collected for creating large datasets and how that audio is split into various audio features and how those features are modulated in the case of each disorder.

What the future looks like

While the current version of the project is not at its highest functionality, we aim at training models to detect more neurological speech disorders and develop a larger scale model that will utilize all the datasets and predict the disorder based on a single audio input itself. We also aim to introduce an AI chatbot that will have the capability to generate a long-term training plan that can aid the user in combatting and potentially overcoming their complication.

Potential Business Model

Click here to view

Built With

joblib
languages:-python-frameworks/libraries:-scikit-learn
librosa
matplotlib
numpy
pandas
platform:
reflex

Submitted to

natHACKS 2024

Created by

I developed a machine learning pipeline to predict whether a person is healthy or unhealthy based on speech analysis. This involved data preprocessing, feature scaling, and using SMOTE to handle class imbalance. I fine-tuned a Gradient Boosting Classifier, saved the best model, and evaluated it using metrics like accuracy, F1 score, precision, and recall. I also created a feature extraction function with Librosa to compute relevant speech features for accurate predictions. Additionally, I proposed a business solution to use this technology for early screening and monitoring of Parkinson's disease, offering accessible healthcare solutions.

Sashreek Addanki
I worked on data collection for training the Machine Learning model and served as the creative lead for this project

Ammaar Mohammed
Yuta Takenaka
Shira Li
Divy Vaghasiya

Updates

Sashreek Addanki started this project — Nov 17, 2024 07:28 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.