Our project Knowledge4COVID-19 aims to showcase the power of integrating disparate sources of knowledge to discover the missing links. In particular, we start by building a Knowledge Graph of what is known about COVID-19 and related viruses to predict interactions and potential adverse effects of drugs suggested for the treatment of COVID-19. Such estimations are quintessential to make safety decisions related to new clinical trials. Furthermore, we are working to address new tasks, such as the identification of potential targets and their association with candidate drugs. Our recent work on similar tasks in the context of the European project iASiS project and the collaboration of research teams that cover different aspects of the problem provides a sound "springboard" for our project. Knowledge engineering technologies are put together with natural language processing and machine learning methods in order to create a knowledge graph of scientific publications and drugs of COVID-19. As a result, the Knowledge4COVID-19 Knowledge Graph is created and several services to explore scientific literature and potential drugs. More importantly, we provide machine methods (i.e., supervised and unsupervised) APIs that allow for the prediction and discovery of the interactions and adverse events of the drugs suggested to treat COVID-19.
The Knowledge4COVID-19 team is composed of the following members:
The Scientific Data Management group at TIB (SDM): Anery Patel, Ariam Rivas, Ahmad Sakor, Vitalis Wiens, and Maria-Esther Vidal (Technical team) and Gabriela Ydler (dissemination).
The Software and Knowledge Engineering Laboratory at NCSR (SKEL): Kostantinos Bougiatiotis, Fotis Aisopos, Anastasia Krithara, and George Paliouras
The Knowledge4COVID-19 Architecture
The SDM group developed a data-driven pipeline able to extract information about drugs and diseases from the scientific literature in the COVID-19 dataset, as well as relevant information for the extracted drugs (e.g., drug-drug interactions, side effects, and indications) from DrugBank. The extracted information is integrated into the Knowledge4COVID-19 knowledge graph. This dataset comprises 52K publications from Pubmed, bioRxiv, medRxiv, and PubMed Central (PMC). The natural language processing tool MetaMap was utilized to recognize drugs and diseases from the titles and abstracts of the integrated articles; the Unified Medical Language System (UMLS) was used to describe the extracted medical entities using a controlled vocabulary of medical terms. In total, 4,162 drugs and 2,012 medical conditions were extracted from the integrated publications. The coronavirus-related drugs are frequently mentioned in these publications; the drugs Chloroquine, Zinc, Human interferon, Methylprednisolone, Ritonavir, Hydroxychloroquine, and Lopinavir represent the top most frequently mentioned drugs. Similarly, the most common medical conditions described in these publications include essential thrombocythemia, cardiac arrest, pneumonia, multiple chronic conditions, and asthma. These distributions reveal interesting properties in the publications that composed the COVID-19 dataset.
The Knowledge4COVID-19 Knowledge Graph
The Knowledge4COVID-19 knowledge graph includes information about adverse effects that may exist when two or more drugs are taken together. Interactions between drugs that may cause adverse side effects have been extracted from the scientific database DrugBank. FALCON- an entity linking tool developed by the members of the SDM team- was used to extract this information from the textual descriptions in DrugBank. As a result, 2,205,099 drug-drug interactions and 5,965 drug toxicities are part of the Knowledge4COVID-19 knowledge graph. Moreover, the SKEL team contributed with machine learning methods that resort to contextual information from the scientific literature stored in the Knowledge4COVID-19 knowledge graph to predict drug-drug interactions. In total, 22,346 potential novel drug-drug interactions are part of the Knowledge4COVID-19 knowledge graph. Knowledge4COVID-19 is linked to existing knowledge graphs to include encyclopaedic and factual knowledge about drugs and conditions represented in DBpedia, Bio2RDF, and DrugBank (14,524).
Patterns Among Drugs and Drug-Drug Interactions
The representation of information in Knowledge4COVID-19 allows for uncovering patterns that facilitate the explanation of treatment results. Moreover, these interactions- in combination with the information about coronavirus-related drugs and medical conditions extracted from scientific articles- enable the understanding of the potential adverse medical conditions that may occur in the presence of conditions such as high blood pressure, asthma, or diabetes.
In the future, we plan to integrate clinical COVID-19 related data to detect patterns that can explain the correlation between survival, and drug interactions and toxicities. We also plan to connect the Knowledge4COVID-19 knowledge graph with the Open Research Knowledge Graph (ORKG). In addition, other types of interactions, e.g., drug-target, protein-protein, and drug-side effects will be extracted from the literature and from scientific databases.