TOScreen
Inspiration
The COVID-19 pandemic is stressing health care systems worldwide, negatively impacting the economy, and changing social interactions. One of the most effective strategies to fight against the pandemic is the massive testing of the population. Nonetheless, tests for medical diagnosis are expensive and require a processing time that can range from minutes to a couple of days. We believe that the use of technology can be an efficient, cost-effective tool to control the pandemic. In this regard, we implemented an Android-based smartphone app powered with AI techniques, which analyses the cough of the users and provides a close to real-time result of the diagnosis.
What it does
The app allows users to record their own voice, and transfer it to a dedicated server for its analysis and further diagnostic.
How we built it
Android app
The app implements a functionality for users to record their own voice. At the back-end, once the recording is completed, the file is transfered to a dedicated server, which receives the audio file, and processes it with the suggested approach. The prediction of the model trained is finally transferred back to the smartphone app, so it can be displayed in the user's smartphone.
Dataset
We use the "Labeled Audio" part of the provided dataset. For all experiments, we first balanced the number of positive and negative examples, i.e., we removed part of the negative examples to match the number of available positive examples. Our working dataset has 90 positive examples and 90 negative examples. By example we refer to one participant, for which we have more than one audio file. Then, we divided our dataset into train and test subsets, which we fixed at the beginning. The seed the pseudorandom generator used for the splitting for reproducibility purposes.
Data preparation
The first step of data preparation consisted of filtering out silent parts of the audio files in the dataset. Using an implementation of the pseudo spectral flux we detect abrupt changes in the spectrum of the signal (e.g. beginning and end of a cough), and cut the audio accordingly to remove them.
In the second step, we extracted a set of standard audio features using openSMILE, a cross-platform and open-source toolkit commonly used for speech and music analysis.
With the extracted features, we trained several machine learning models (see next section).
Machine learning (ML) models
Using the set of audio features we extracted, we trained and evaluated several machine learning models using Scikit-learn. We experimented with different parameters for each of them, and also performed GridSearch in some cases. These are the main ML methods we tested (some supervised and some unsupervised):
- Support vector machines
- K-Means Clustering (K=2)
- K-Nearest Neighbors classifiers (K=5)
- Multi-layer Perceptron (MLP)
- Random Forest Classifier
Challenges we ran into
One of the big challenges has been to ensure the quality of the training data. Using non-cough segments might just add noise to the model, as we hypothesise these segments from tested positive or negative participants contain information which is most likely not relevant for the task. Although the first filtering step removes part of these segments (at the beginning and end of the audio files), the silence between two consecutive coughs remains.
Accomplishments that we're proud of
The implementation of a full proof of concept demonstrator, from the user interface to the back-end, to showcase the potential of using new technologies to control the spread of COVID-19.
What's next for TOScreen
The next steps mainly include the research on the models, experimenting with different features sets and neural network architectures. The next goal is to maximise the performance of the model and, specifically, to reduce the number of false negatives inferred by the models.
Log in or sign up for Devpost to join the conversation.