Spread Modelling: Forecasting with a unified database

Topic
All Github Repositories
Overall Concept
Github Repository Data Unification
Visualization: Map close up
Visualization: How it is spreading 1
Visualization: How it is spreading 2

Inspiration

Until a few weeks ago there were hardly any useful infographics, modelling of the spread or curated datasets about Covid-19 available. This made it hard to communicate the gravity of the situation to friends and decision makers. Also this made it difficult for researchers to just start, analyse data, develop machine learning models. Thus, Richard Leibrandt started it as a project at #CodeVsCovid19. His inspiration was to build unified and curated datasets of various countries, which contain Covid-19 cases with population details and political actions to forecast future spread in order to enable both decision makers and individuals to make personal adjustments.

What it does so far

The entire project has three parts:

A set of data preparation software that downloads publicly available datasets and harmonizes them such that they can be merged together easily.
Models that are able to forecast the spread of the virus.
A webpage which displays the course of the spread on country level, regional level (smaller than country, e.g. "Kanton") and county level. The webpage will show both the historical and the future course using the predicitve models.

What we managed to do in the Hackathon

We have been working on all three parts to provide an end-to-end prototype to showcase how such projects can be done and motivate to continue working on the project:

For some countries part the data preparation is finished (Switzerland, USA, UK, Italy), others are close to finishing.
Different models where theoretically discussed, one was implemented.
The webpage will be up and running (but for now, possibly only for historical data).

How we built it

We used the data science stack of python: python, scipy, numpy, pandas. The webpage is running on the Oracle cloud platform, thanks to the help of one very capable Oracle employee, whom we had as a team member.

Challenges we ran into

Finding data for the countries was not that simple, cleaning it, defining standards - all that took quite a bit of time. With 16 people, we were a pretty large team, which made coordination challenging, especially since we were a pretty heterogeneous group. We also had some challenges with the virtual communication within our team, because not to have a real face-to-face communication can be quite difficult regarding coordinations, overcoming the timidity to say something in the calls or to get an overview of who does what.

What we are proud of and what we learned

Even though we were so many people and so heterogeneous, we still we were able to function as a group. We are proud of that nobody did throw the towel, everyone was motivated and we were able to work together harmoniously. We learned how to best cooperate with a virtual team around the world: regular e-meetings, clarify and assign tasks to each group member. The task is pretty enormous and since we had varying level of expertise we needed to help each other. But even nobody knew anyone before the hackathon, we managed to do this smoothly - this is something to be proud of. Some people learned about software engineering, some about data science, others how to manage - and we all had a lot of fun.

What's next for Spread Modelling

We want to continue on the project. Whether in another hackathon or not... definitely as a continuous project.

Built With

Submitted to

#CodeVsCovid19

Created by

I was the initiator and project lead. Besides leading and managing the project, I build the specifications for the harmonized datasets and the software architecture, gave ideas for the modelling and helped out where I could.

Richard Leibrandt
I gave some inputs regarding modelling

Lars Müller
I helped with the data download and preparation. I also looked into the visualization with R.

Carla Özen
I worked on the data sourcing, data preparation and gave some inputs regarding modelling.

Carmen Moreno
I wrote some of the data preparation code.

Alexandre DeZotti
I worked on the data download and preparation.

Jenny Epple
Teambuilding/mentoring for supersized international team of 15+ individuals, providing focus + structure, creating safe space to collaborate, pairing experts with novices to ensure valuable learning experience for all.

Pia Eggert
Raghavendra Vijayanagaram
Deep Learning Engineer
Benedict Bileam Scheuvens
Thomas Jenny
Marius Bild
Dominik Gehl
Sandra Fritzke