TRACK: PREDICTIVE MEDICINE - Challenge 1

Conclusion

During the hackathon, we managed to create a model which can solve both predictive medicine and hospital resource allocation.

We are able to predict with 94.99% Accuracy and 80% Precision the efficacy of a treatment for a given patient.

Our model was trained using the Patient Demographics data, the Drugs given to the patient, and the Lab Results during the treatment, given by the HM Hospitals and Sanitas.

The resulting model can be reused for Three use-cases.

Proposed web interface for healthcare professionals to use our tool.

Propose Solution

We are creating a predictor for hospitals to estimate the following items: a) List the options of treatments for a given patient demographic; b) Likelihood of survival of a patient on the current treatment; c) Transparency on which variables (symptoms, lab results) affect the treatment efficacy for each patient demographic (age groups/gender).

Our technology stacks consists of a GPU-powered Neuroevolution architecture, which is able to find the best neural network model out of 20 Billion combinations, openly explaining which parameters were chosen and how the prediction is made - avoiding the ML blackbox.

Our Story Background

Making humanity evolve faster than viruses. Helping scientists and health professionals to use data intelligence to deliver more efficient treatments and discover novel drugs.

Our initial idea of AI in Healthcare was born after winning 5 Hack The Crisis hackathons during 2020. We had a solution of a Citizen Science Game, enabling people to help scientists find vaccine formulas by playing a game - helping to label datasets for epitope protein-based vaccine prediction. https://www.analysismode.com/aminocrush/

Share this project:

Updates

posted an update

This is what we have learned about the HM Hospitals Dataset, when focusing on the MOTIVO_ALTA as a target prediction column.

a) Which of these values can be considered as healed patient?

Alta Voluntaria -> assume healed

Domicilio -> healed

Fallecimiento -> dead

Traslado a un Centro Sociosanitario -> those patients should be progressing well (i.e. critic patients are normally not transferred to other facilities) but can’t be 100% certain as we lose track of them . You can assume healed or exclude them from the analyses

Traslado al Hospital -> those patients should be progressing well (i.e. critic patients are normally not transferred to other facilities) but can’t be 100% certain as we lose track of them . You can assume healed or exclude them from the analyses

b) Would be possible a patient which was in 'Traslado' then end-up in 'Fallecimiento' ? ....Yes

c) Can we assume death/healing of the patient based on the Traslado?

....If anything, you can assume healed based on the explanation in question a). Also, consider excluding from analyses those patient

Log in or sign up for Devpost to join the conversation.

posted an update

These days we've been mostly analysing the HM Hospitals dataset, and these questions came up.

a) Which of these values can be considered as healed patient? b) Would be possible a patient which was in 'Traslado' then end-up in 'Fallecimiento' ? c) Can we assume death/healing of the patient based on the Traslado?

Our reasoning is to use the MOTIVO_ALTA column as the target of our prediction. We will predict the Patient Death/Healing chance by analysing the Demographics+Drug/Treatment given.

Log in or sign up for Devpost to join the conversation.

posted an update

Happy Vappu (International Workers' Day), the largest celebration in all of Finland! Unfortunately like the previous year, outside gatherings are still limited due to the ongoing pandemic. However the Vappu spirit is still strong and people are having small indoor gatherings with close friends and family filled with fresh donuts, party ribbons, and sparkling wine! The continued work done by the scientific community and here in this hackathon brings us closer each day to ending this pandemic and fully reopening society.

Log in or sign up for Devpost to join the conversation.

posted an update

There are 3 batches of the dataset.

I'm analyzing the 2nd batch from date 20_07_2020 - which was the first to be downloaded. This batch HM dataset has 5 spreadsheets, with a total of 60 columns and 2548 patients but a total of >400K rows worth of data - with quite a lot of patient data.

They all seem to be clean data, with uniform values. I will present these to you all at our 1st team meeting.

Next steps:

  1. Decide what do we want to measure. Treatment efficacy? Personalized treatment groups by demographics? Lab result predictors for adding/removing/changing medication? Etc.. (@everyone @channel)
  2. Combine all spreadsheets into one, correctly identifying all patients and their info. (@Mia Le @tiagosampaio)
  3. Find the normal distribution of the dataset and normalize it, before inputting it into AnalysisMode AutoML. (@teppohudsson @tiagosampaio)
  4. Remove unnecessary columns which don't have a biological meaning. Timestamps? (@Eric Coles @Mia Le)
  5. Developing an explanation of our prediction methodology. (@Eric Coles @Tan @Milda Dapkeviciute)
  6. Define success metrics and how to calculate them. (@teppohudsson)
  7. Develop a UI to present as a report - output of our predictions. (@teppohudsson @Mia Le)
  8. Execute the simulation on AutoML and calibrate the parameters for >80% accuracy and 50% precision. (@tiagosampaio)
  9. Review AutoML parameters to increase precision > 50%. (@tiagosampaio)
  10. Create presentation material. (@Milda Dapkeviciute @Eric Coles)
  11. Write the 1st report/newsletter to hackathon organizers, publish on devpost (@Milda Dapkeviciute @Mia Le @tiagosampaio)

Log in or sign up for Devpost to join the conversation.