Data is the basis for transparency and further automation of formerly manual processes. Data enables us to create insights and helps us in decision making and understanding. In a time where facts are the only information that should be spread, we want to provide a platform where professionals can share and distribute curated, high quality data sets. With a user-friendly interface, data can be accessed and shared easily with anyone who wants to put his or her knowledge and work into fighting the COVID-19 crisis.
What it does
Lake COVID will enable to collect and store all relevant data around the current crisis. We are able to host and provide any type of data, at any scale, and with an infrastructure that is of low cost. We implemented Lake COVID with a security layer to prevent unauthorized deletion of insightful data sources. All data sources are cataloged and can easily be searched to find the relevant data in the central repository. With Lake COVID, you will be able to perform insightful analysis quick and easy so that others can benefit from your findings as soon as possible.
How we have built it
We are leveraging existing AWS solutions with the focus on best practices in terms of reliability, security and cost. The solution deploys a console that users can access to search, browse and download available datasets for COVID analysis.
Challenges we ran into and what we learned from it
Team organization and management is key for the success of a project. By splitting the challenge into smaller subtasks we created small teams working on specific features of the data lake. Team leads were selected to coordinate meetings and the information exchange with the other groups. From a technical perspective, we were able to dive deep into some of the data-driven AWS services where we needed to debug and adjust configurations to fit our needs.
Accomplishments that we are proud of
We proudly present a first minimal viable version of our data lake solution. It allows registered users to search, share and update data sets in one consolidated data lake. Data sets will be updated to the latest version by internal crawling procedures to keep the sources always updated.
Login to the data lake
Search for data
What's next for Lake COVID
Our main goal is to spread the word so that data scientists, analysts, and researches of any kind become aware of Lake COVID and can benefit from our work. In these days collaboration is more important than ever before. That is why we want to extend the project by adding further features such as real-time data streaming and advanced data quality checks to provide useful data sources to our users.