posted an update

There are 3 batches of the dataset.

I'm analyzing the 2nd batch from date 20_07_2020 - which was the first to be downloaded. This batch HM dataset has 5 spreadsheets, with a total of 60 columns and 2548 patients but a total of >400K rows worth of data - with quite a lot of patient data.

They all seem to be clean data, with uniform values. I will present these to you all at our 1st team meeting.

Next steps:

  1. Decide what do we want to measure. Treatment efficacy? Personalized treatment groups by demographics? Lab result predictors for adding/removing/changing medication? Etc.. (@everyone @channel)
  2. Combine all spreadsheets into one, correctly identifying all patients and their info. (@Mia Le @tiagosampaio)
  3. Find the normal distribution of the dataset and normalize it, before inputting it into AnalysisMode AutoML. (@teppohudsson @tiagosampaio)
  4. Remove unnecessary columns which don't have a biological meaning. Timestamps? (@Eric Coles @Mia Le)
  5. Developing an explanation of our prediction methodology. (@Eric Coles @Tan @Milda Dapkeviciute)
  6. Define success metrics and how to calculate them. (@teppohudsson)
  7. Develop a UI to present as a report - output of our predictions. (@teppohudsson @Mia Le)
  8. Execute the simulation on AutoML and calibrate the parameters for >80% accuracy and 50% precision. (@tiagosampaio)
  9. Review AutoML parameters to increase precision > 50%. (@tiagosampaio)
  10. Create presentation material. (@Milda Dapkeviciute @Eric Coles)
  11. Write the 1st report/newsletter to hackathon organizers, publish on devpost (@Milda Dapkeviciute @Mia Le @tiagosampaio)

Log in or sign up for Devpost to join the conversation.