We are a group of machine learning engineers and a medical doctor eager to contribute with ideas and tools to streamline information acquisition, sharing and exploitation during and after COVID-19 pandemic.

We propose MeDEP, a platform suitable for intuitive data sharing and research. Our idea is to aggregate all relavant data sources within a healthcare organization/ecosystem in order to help researchers and data scientists do their work on readily-available and up-to-date data. Our platform makes it possible to do exploratory "handmade" research with scripts and low complexity program code with high research value. If a certain analysis becomes periodic, it can be redesigned into a robust and reliable data processing pipeline which is periodically triggered. Among other data processing necessities, our platform also enables a streamlined integration of any machine learning method, which can be deployed and used within a single hospital (on a MeDEP-local stack), or across multiple hospitals (MeDEP-federated stack).

In addition to the complex software architecture, which is the main driver for all other functionality, our platform also offers regularly updated annotations of existing COVID-19-related literature by employing unsupervised keyphrase detection on more than 30,000 medical documents. The annotated documents are intuitively accesible via a customly designed search engine, capable of ranking how relevant a given query is to a given document, offering fast prioritization of which documents to consider.

We showcase the platform's functionality on three case studies related to mining of the existing biomedical literature related to COVID-19. We show how topic modeling can be used to split the desired set of documents into smaller, manegeable topics that facilitate search. MeDEP also offers an interactive visualization of all documents, projected into 2D semantic space, where the document proximity implies a given pair of documents is semantically similar, offering a novel way to explore the existing literature. Finally, MeDEP can also be used for more detailed study of potential drug targets, which we demonstrate by considering binding partners of a promising receptor CD147, which is currently not well explored. We show a case study, where the capabilities of predictive machine learning models are studied in a multi-hospital setting.

MeDEP also offers a dashboard that displays media sentiment related to COVID-19, along with patient incidence data and protocols, relevant to the medical professionals, legislation changes during the epidemic and news related to it. The functionality is valuable for those interested in the most current information related to COVID-19, facilitating and speeding up the decision making process.

One of the main challenges, however, was the design of the infrastructure that is directly applicable in a multi-hospital setting. We plan to build our infrastructure by using docker containers, managed in a Docker Swarm cluster. With such a horizontally scalable and resillient solution, we are able to perform efficiently. Since we will be dealing with a lot of data, we need to store it somewhere. We chose to select a MinIO - an S3 compatible storage layer, if we would ever have the need to scale out. MinIO is a framework for distributed data management which offers fault-tolerance and high availability. Current implementation also uses Swarmpit for monitoring of the swarm and controling of the resource usage. For data processing workflows, we will use Apache Airflow.

As a part of MeDEP, we also envisioned the followed business plan. In the third quarter of 2020 (Q3), the plan is to arrange meetings with healthcare stakeholders, to align directions for integration of the platform. In the first quarter of 2021 (Q1), the platform shall be fully developed, the alpha version of MeDEP will be produced. In the second quarter of 2021 (Q2), MeDEP will be installed in first hospitals (possibly in Ljubljana, our home city). In the third quarter of 2021 (Q3), first connectors, i.e., the endpoints for integration with other hospitals, will be established. In the first quarter of 2022 (Q1), the first expansions to other hospitals are planned.

We believe the platform offers a solid step towards a scalable and simple-to-use solution for facilitating research and data-driven decision making in the time of COVID-19.

Built With

Share this project:

Updates