We were inspired by the recent emergence of new major COVID-19 variants that have popped up in the UK, South Africa, and Brazil, and we wanted to find a way to predict which variant an Ontarian was likely to catch based on some factors that we isolated.
What it does
Our model is simple: it is a set of classification trees trained and tested by COVID-19 case data sourced from the Ontario Public Health Authority that details a case's age, gender, most recent possible avenue of exposure to the virus, and city resided in. To use the model to predict which variant one was most likely to catch, one would simply gather their information and follow the tree's branches in accordance with it until reaching a leaf node.
How we built it
Using many of Python's libraries such as random, scikit learn (the main ingredient to creating our classification trees), numpy, pandas, and others, we were able to effectively transform our original, raw dataset into the model that we are submitting here.
Challenges we ran into
The biggest challenge we had to deal with was the problem that the dataset we were able to extract and clean from the Ontario Public Health Authority did not contain a variable that indicated whether an individual case had contracted a variant of the virus, and so we were forced to think outside the box to make our classification trees work.
Accomplishments that we're proud of
We are proud of the way in which we were able to work as a team that had never actually worked together before and was assembled at essentially the last minute due to others pulling out of the competition. Also, despite the fact that we could not figure out what we wanted to explore or create for almost the first 12 available hours, we feel proud that we were able to pull ourselves together and agree on creating this as our final product.
What we learned
One of our biggest takeaways from this experience was the discovery of many different Python libraries and methods that could be used in data analysis that simplified our work considerably, which we will no doubt continue to use for our future projects.
What's next for A Model Predicting COVID-19 Variant Susceptibility
Originally, we had wanted to use Moderna, Pfizer, Astra-Zeneca, and Johnson & Johnson's clinical trial data to build a model that would predict for people the vaccine that they should take in order to maximize efficacy and build immunity. However, as this data is not yet available to the general public, we settled on building our model around the analysis of the 3 major COVID-19 variants that had recently popped up. If the pharmaceutical companies' data is one day released publicly, we could expand on this model idea, except this time with COVID-19 vaccines instead of variants.