Disease Maps and text mining for drug prediction

COVID-19 Disease Map exploration
The architecture of the workflow

Inspiration

Due to the ongoing COVID19 pandemic, we urgently need to understand the nature of the disease and its mechanisms to develop new therapies. It turns out that the underlying mechanisms of COVID-19 are complex and span multiple pathways and cell types. Thus, to understand the disease from a holistic perspective, this available, but often scattered knowledge needs to be combined to accelerate the process of drug development. Moreover, we need to ensure this knowledge is of high quality to serve as a basis for high fidelity treatments.

However, traditional human comprehension and intuition-based approaches to interpreting complex, dynamically interacting systems for drug development have a high failure rate. Therefore we combined comprehensive information gathering with computational models for predictive interpretation.

Impact

The potential impact is very high. At the moment of writing, there are more than 200 COVID-19 clinical trials. A recent study indicates that network-based bioinformatics approaches can drastically reduce drug discovery costs. In particular, these tools can aid in the identification of new and prioritizing existing drugs that can be explored for COVID-19. Indeed, drug repositioning is a constantly growing, multi-billion market in which our work can play a significant role.

Expertise

In fact, already for over a month, we have been constructing a knowledge repository for COVID-19. This effort leverages our experience and expertise constructing comprehensive maps of other human diseases as well as literature mining-based approaches to rapid epidemic response.

We are developing novel bioinformatics workflows for precise formulation of computational models, and accurate data interpretation that has the potential to suggest drug repositioning (repurposing). These workflows integrate expert knowledge of molecular mechanisms of SARS-CoV-2 infection and host cell response, databases and data, and computational modeling.

Improving our understanding of the mechanisms of COVID-19 and identifying potential rationalized re-use of already existing drugs, we will provide the community with new options to cope with the ongoing pandemic. In the longer run, finding new drug candidates will boost the drug development field with testable suggestions to produce new treatments to this type of disease and prevent new waves of COVID-19 or similar pandemics.

What it does

The interoperable pipeline we developed allowed us to generate a set of enriched maps and executable Boolean models within the tight time-frame of the hackathon. We tested these models on a first set of hypotheses and now we make them openly available to domain experts to develop or test their own hypothesis quickly and without experimental resource requirement.

Our pipeline allows us to run updates of the maps and models in a semi-automatic way and therefore keep up with the rapidly developing literature and data on COVID-19.

Our work brings together a repository of high-quality computational maps and executable models depicting the mechanisms behind COVID-19, two text mining solutions, and two modeling platforms, plus one proprietary algorithm and database (e-NIOS) and a map-to-model conversion pipeline (CaSQ) with three databases of drug-target interactions. We leverage expert knowledge, enriched with machine-assisted literature mining, and analyze the models in an effort to identify elements critical for the disease. Then we identify drugs that have been shown to modulate these key targets.

The highlights of our solution

An expansive body of scientific literature is handled by text mining;
A high number of different data repositories are interconnected by standardized formats, annotations, and interfaces;
Fast prediction of testable drug targets by efficient computational pipelines;
Open and easy access via the web, accessible and interactive;
Easy to adapt to new challenges due to standardized formats;
Can support combinatorial therapies, as multiple drugs and targets are suggested; and
Repurposing of existing drugs

How we built it

Our workflow

Our team of 15 people simultaneously developed the workflow. We coordinated our work as illustrated in the architecture figure. We combined the building blocks as indicated above.

The starting point is the COVID-19 Disease Map, whose contents were uploaded to an online platform for visual exploration and analytics.

The contents of this repository are enriched using COVID19 Miner and Biomax AILANI text mining solutions, applied to COVID-19-related literature. This way we are able to find information that experts might have missed, or identify new drug interactions. The contents of the enriched repository are translated into executable models using CaSQ in formats compatible for analysis with Cell Collective and Hipathia for prediction of molecules that have an influence on COVID19-related cellular phenotypes.

Finally, identified key molecules are mapped to drug-target databases DrugBank, ChEMBL, and e-NIOS, and to text mining-based results for identification of plausible drugs, suggested to modify the molecular response to COVID19.

Important interfaces and approaches:

MINERVA API for programmatic access of expert knowledge and annotations of the COVID-19 Disease Map, as well as drugs targeting the molecules of the map
Biomax AILANI API for retrieval of the text mining results for COVID-19 and Influenza literature, as well as drug targets
CaSQ for automated conversion of the diagrams of the COVID-19 Disease Map into executable Boolean models.
Standardized processing of models in CellCollective and Hipathia, allowing to produce a visualization of important molecules in the COVID-19 Disease Map.
Stable state analysis using GINSim to generate testable hypotheses.

Using the API and the resulting datasets, we built conversion scripts, combining knowledge graphs of the COVID-19 Disease Map and text mining results for their enrichment. Another part of the workflow converts these enriched maps into executable models available in CellCollective and Hipathia. The next part of the workflow maps the modeling results back to the identifiers in the enriched COVID-19 Disease Map. Finally, this information is combined with drug targets for their prioritization.

To empower the domain experts to re-use our results and therefore maximize the impact we follow the RDA COVID-19 guidelines:

1. FAIR and timely availability

As suggested by RDA we currently balance our results toward timely availability and continue to work on their FAIRness. Our tools have previously been published and are indexed on PubMed and searchable online. All tools are open access and MINERVA, AILANI, and CellCollective provide APIs. The developed maps and models are already available on a GitLab repository and use standard formats such as GML, SIF or SBML. We put all maps and models under a CC-BY license. The next steps will focus on unique, stable identifiers, metadata and versioning of data and tools to further improve FAIRness.

2. Metadata and documentation for discovery

The initial documentation of our approach, workflow and data sources is provided within the current document and will be extended into a full manuscript. At the same time, appropriate metadata will be associated with each of the maps and models and will be registered and indexed on FairdomHub to enable discovery.

3. Public repositories

Currently, our maps and models, although still working versions, are already available within a public GitLab repository. As they become stable we will release them into public disciplinary repositories such as BioModels.

4. Documentation, manuals, support

The tools used to produce our maps and models are mostly mature, publicly available solutions with rich documentation, online help and some video tutorials, and also newly developed tools. We provide support for tools usage and will be happy to collaborate on application, re-use and extension of the produced maps and models.

Challenges we ran into

High-quality mechanisms have limited scope, resulting in limited models and reducing the chance to find good drug targets. Also, very often re-use of generic pathways from the general pathway databases can end up with a list for targets that are not very specific to the disease and play a role in too many biological processes, raising the danger of side effects.

A great technical challenge was cross-linking of a range of different APIs, as well as handling of datasets coming from semi-automated processing.

Finally, text mining solutions required fine-tuning due to a lack of standardized vocabulary in the naming of the SARS-CoV-2 proteins. Similarly, manual curation from literature risk introducing non-standard, ambiguous human gene names which require manual effort to resolve to standard identifiers.

Accomplishments that we're proud of

During the Hackathon days, we gathered together and built up a seamless workflow based on well-established tools and platforms and produced a powerful computational hybrid to rationalize drug choice. The workflow will enable domain experts, such as clinicians, virologists, and immunologists, to collaborate with data scientists and computational biologists.

We are proud of our interdisciplinary work. The group capitalized on complementary skills, ranging from code development, bioinformatics, computational modeling to life sciences.

First prediction results

Despite the tight timeline we managed to achieve first insights, one of our models implicates that blocking Nsp15 would increase the innate immunity response, in agreement with literature evidence. Moreover, going back to drug targets in our map, we found that OAS3, DDX58 and IFIH1 are also viable drug targets, and their effect can be further examined using the model in CellCollective.

What we learned

The decision to combine three independent text mining approaches was correct. This helped us to filter our noisy results, so we are sure that we rely on robust results.

We must step into the multicellular modeling, expanding the COVID-19 map resource with cell-type-specific molecular maps, especially for a variety of the immune cells.

The workflow has to be further improved and streamlined, such that it’s accessible for life scientists and clinical researchers, reducing the technology barrier.

What's next for Disease Maps and text mining for drug prediction

What we plan to help with COVID-19

We will complete and expand the COVID-19 disease map with the knowledge from newly-published papers to create cell-type-specific maps, scaling up our modeling to the systemic level. Improving input from text mining techniques will allow keeping the COVID-19 disease map repository content up-to-date.

By comparing the mechanisms and drug targets, we will look into the comorbidities of disease. For example, how pre-conditions such as allergy, cancer, diabetes, etc. affect the predisposition to the COVID-19 disease and what molecular mechanisms are responsible for this.

We will use the resource for data analysis, especially the new datasets coming from the COVID-19 cohorts around the world, especially data from single cell-based studies and from comprehensive multi-omics studies. As data becomes available, it will allow adding further modeling paradigms besides Boolean approaches, such as kinetic and agent-based modeling, and their parameterization.

We will expand a collaborative network with domain experts to ensure the verification of model predictions by experimental work as well as guiding experimental and clinical work.

What we plan beyond the pandemic

Our tools and workflows are reusable beyond the current pandemic and can be applied in general pathobiology and comorbidity research. For this reason, we will need to improve their FAIRness. Findability will be improved by offering stable identifiers to resources and metadata to workflows. Accessibility will require adaptation to high-performance computational infrastructures such as the European Grid Infrastructure. Interoperability will be achieved by closer collaboration with ELIXIR and similar initiatives. Finally, for Reusability, we plan to rely on stable APIs and Dockerize our workflows.

What we need

This promising project will require fundraising and recruitment of specialists in various fields from biocuration to mathematical modeling and beyond. Apart from the availability of resources and funding, this task critically depends on the open, timely availability of newly generated data.

Built With

ailani
casq
cellcollective
celldesigner
cytoscape
ginsim
hipathia
minerva
python
r

Submitted to

The European Commission's EUvsVirus Hackathon

Created by

I was working on scripting workflows in R to pull down drug targets for COVID19 mechanisms. Started to use new APIs to modelling (CellCollective) and text mining (BioMax) platforms.

Marek Ostaszewski
A Data Scientist and a Scientific Project Manager at LCSB, University of Luxembourg
vidisha Singh
PhD student in systems biology for complex disease
Bhanwar Lal Puniya
Robert Moore
Sara Aghamiri
AI and medical modeling Engineer. Institut national de la santé et de la recherche médicale -INSERM
Private user
Rupert Overall
Dieter Maier
Anna Niarakis
Associate Professor of Computational Systems Biology, Univ Evry - University of Paris-Saclay
Inna Kuperstein
Luis Cristóbal Monraz Gómez
Tomas Helikar
Marina Esteban
PhD student at Clinical Bioinformatics Area (Fundación Progreso y Salud)
Matti Hoch
PhD Student at the Department of Systems Biology and Bioinformatics, University of Rostock, Germany