Inspiration

Many infectious diseases, including COVID-19, are not detected fast enough to prevent the spreading to other people. The symptoms often start after the person is already infectious to others. That is why we want to find a solution to detect the infections faster and prevent them from spreading. A company called AI4medicine already approached this problem and developed an algorithm that predicts infections according to increased heart rate. However, the accuracy of their algorithm is not prefect (80%), therefore, we want to improve it.

What it does

Takes raw data and preprocesses it, leading to a clean dataset. Augments dataset to deal with imbalanced data. This dataset is used for training a model that analyzes the heartrate of each subject over a period of time and classifies the days to be or not the symptom onsate date.

How we built it

Data preprocessing was done in python, merging to one csv file the data and its labels. Data augmentation from our preprocessed data was then also done in python, by adding noise and row shuffling.

Challenges we ran into

  • Provided datasets were unorganized, making the preprocessing a complex task. Labels were in different files, making it hard to obtain a structured dataset to train the model.
  • Dataset was very imbalanced
  • Required output was a date prediction rather than a classification output.

Accomplishments that we're proud of

Cleaning of the dataset, obtaining an organized and labeled one that was suitable for training a model. Data augmentation obtaining a new dataset of 300% the size of the original one. Trained binary classification model, that outputs 1 or 0 if the evaluated day is the symptom onset or not.

What we learned

Preprocessing of big amount of data, taking into account timestamps Use AI resources like Azure.

What's next for AI4Medicine_BrAIny

Finish training model with augmented dataset

Built With

Share this project:

Updates