There are increasingly more apps and websites that use some parameters like symptoms of illness (such as the popular UK app), the number of proximity contacts (e.g., WeTrace and PPEP-PT), or environmental exposure levels (like risk map dashboards). However, most of them are looking at a subset of important parameters and may lack the required scientific rigor for evaluating the COVID risk. WeTrace, for example, may lead to a panic pandemic if it gives many false alerts. Similarly, existing risk maps are all static and cannot combine all three important factors to produce a live risk map, rendering most of them almost useless.
What it does
In this international project, we are developing a solid aggregated risk evaluation model based on all those factors and an API such that existing services can improve their decisions of someone being infected and should receive an alert or contact health care authorities. Various apps and websites can query our API using even a subset of the parameters and get better risk values, compared to the existing solutions. It also motivates those solutions to get more data and to improve their service. However, we have faced multiple challenges, mainly with the lack of datasets even for a single factor to train models.
How we built it
We are a group of epidemiologists, mathematicians, data scientists, and developers from EPFL, MIT, and India. We are collecting a comprehensive list of important parameters for COVID risk evaluation by searching through the existing scientific literature and risk evaluation models. We have developed a standard format to aggregate to help various websites collect the same info and share their data and build basic models for S (symptoms), C (contacts), and E (environment). However, lack of data hindered further extension and validation of the models. We are excited to use this data frame to collect data and develop the models.
Challenges we ran into
Lack of data: most literature have focused on a subset of parameters and it was hard to find (even via our link to WHO) datasets that include all those parameters.
Accomplishments that we are proud of
Teamwork when we are located in various continents! but we are happy with the scientific approach we took.
What we learned
Data-driven modeling without data is very hard. You may need to rely on many assumptions and cannot verify them in a reality where the data are noisy and biased.
What's next for CARE (COVID Aggregate Risk Evaluation) PROPOSAL
We use the data format and API to collect data and develop/validate our early models.