COVID19 Insights Platform
Team: Jack Hampson, Edward Brown, Dr Brett Drury, Dr Marcia Oliveira, Eduardo Piairo, Matthew Harker, Hew Leith
Mentor: Dr Hugo Ferreira
Support from: _The BMJ, Amazon _
Thanks to: Dr Andrew Jones - Head of Clinical Innovation, AWS Dr Sophie Harris - Consultant at Kings College London Dr Maxime Baroz - ICU Anesthesiologist, HUG, Geneva Dr Lydia Wuarin - Vascular Surgeon, HUG, Geneva Dr Rueben - MedReg, Kings College London
1. The problem your project solves The COVID19 Insights platform aims to support the quick discovery of the latest facts from trusted sources surrounding COVID-19, to support new doctors, retirees, key workers or clinicians coming out of specialism to gain a general understanding of the virus, and then to remain up-to-date.
Team lead Jack Hampson was encouraged to form the team of technology and healthcare data professionals after speaking with his sister, a Consultant at Kings College London. Dr Sophie Harris, a Consultant Endocrinologist was overwhelmed by the vast amounts of disparate, invalidated, and potentially risky information around Covid19, when her ward was taken over two weeks ago. She also spent over three days trying to gain as much knowledge in this new field of medicine as possible. Not only for general care, but also in relation to her specialism - Diabetes.
Further research and interviews with doctors concluded that it wasn’t just Dr Harris that was having this issue but even ICU doctors already working with COVID-19 patients, and other clinicians across Europe. We learnt that most of the information today is being shared via private whatsapp groups amongst doctor friends in wards and that there’s not a trusted resource that clinicians can use to onboard the information they need as quickly and as effectively as possible.
We also know that the research space around Covid19 is changing rapidly, and therefore new information on best practices for care or treatments is hard to discover. With limited time for clinicians working on the front-line, or even key workers, we felt a combined solution would solve the two biggest problems:
A. How to onboard clinicians or key workers with an overall knowledge of the COVID-19 in relation to patient care. B. How to keep clinicians and key workers up-to-date with emerging research in relation to their fields of interest or specialism.
2. The solution you bring to the table (including technical details, architecture, tools used) The COVID19 Insights Platform is a clinical diagnostics and patient care research tool. Available as an open resource for all clinicians and key workers to access with ease via desktop or mobile browsers. The platform incorporates a Knowledge Graph made up of over 50,000 validated research papers, and uses Graph navigation to allow users to easily search this vast amount of data.
The users’ navigation is aided through Artificial Intelligence, by automatically linking key concepts such as symptoms and treatments (i.e. drugs). This way a clinician could search for symptoms their patient is showing, and identify potential treatments present in the research data.
This research is then paired with regional or international best practice guidelines, such as NICE in the UK, to further aid the clinicians’ decision path for patient care (roadmap).
Finally, an email alert system will update users with new and emerging research in their fields of interest, such as sub-specialisms in relation to COVID-19.
Platform Features ● Free access via any desktop or mobile browser ● Automatic discovery of emerging trends ● Over 50,000 validated research papers and growing ● Email alerts for new research related to clinicians’ fields of interest ● Automatically links concepts within the data i.e. symptoms and treatments ● Intuitive graph-based navigation of this Big Data set ● Link to original paper
Technical Overview Our solution, the COVID19 Insights Platform, uses a vast dataset of over 50,000 research papers provided by a consortium led by the Whitehouse and Allen Institute known as CORD-19. With this data set we generate a Knowledge Graph that encodes biomedical concepts and the relations between them. On top of which we generate Topic Graphs for visualising these relationships which uniquely uses Graph navigation to aid with the discovery and learning process.
This fully-developed tool will be for open use by anyone via https://covid.deeperinsights.com.
Document Processing The Graph was created in two phases: First, document processing, consisting of a custom spaCy/SciSpacy pipeline - and second a bulk insert into our Amazon Neptune instance. SciSpacy and UMLS Initial parsing of the CORD-19 source data was done using SciSpacy (a Python package containing spaCy models for processing biomedical, scientific or clinical text). We also used the SciSpacy Unified Medical Language System (UMLS) Named Entity Linker, to link the entities found by its NER to a Knowledge Base consisting of UMLS Concepts. Besides SciSpacy, we customised our pipeline to enrich documents with some additional information. Custom COVID-19 Entity Recognition Since COVID-19 is a new disease, and lacks an entry in the UMLS, it isn't linked by the SciSpacy Linker. Hence, we created an additional NER dedicated to recognising mentions of COVID-19. This was added to the pipeline prior to the SciSpacy NER, to prevent the latter incorrectly identifying COVID-19 an existing UMLS concept. We then linked these to a Concept ID from the Supplementary Concept Record for Coronavirus Disease 2019 newly issued by MeSH, allowing for them to be queried much as any other UMLS Concept. The result of the Document Processing phase is a large Neptune CSV format file, ready for bulk insert in Amazon Neptune. Amazon Neptune and Gremlin Once loaded, the dataset is initially represented in the Graph as entities-within-sentences-within-sections, roughly as follows: ● Document - ○ Title ■ Sentence1 ■ Entity1 ■ ... ■ Entityn ■ ... ■ Sentencen ○ Abstract ■ (As Title) ○ Full Text ■ (As Title)
Since our Graph instance is under Neptune, it uses the Gremlin query language. Hence our code examples below are written in Gremlin. The UMLS Semantic Network Once hosted, we further enriched our Graph with the structure (in addition to the Concepts, that is) behind the UMLS. Each Entity found during the NLP phase is linked, via an instance_of edge to a) one EntityType and b) one or more SemanticType Vertices, meaning every Entity is given its place within the structure of the UMLS Taxonomy. From there, we linked the SemanticType Vertex with any corresponding relationships mentioned in the UMLS Semantic Network. This allows us to query our documents according to that network too. For example, to find Entities in our dataset that, say, are the types of things (Bodily Organs, for example) that produce a Body Substance, we can issue this query: g.V().out('entity').where(out('instance_of').hasLabel('SemanticType').out('produces').has('name','Body Substance')).
To find ones (e.g. "child", "Adults") that represent Age Groups: g.V().out('entity').where(out('instance_of').hasLabel('SemanticType').in('isa').has('name','Age Group')).
Pros and Cons Pros ● Our solution is extremely powerful, allowing queries to exploit the rich structure of the UMLS network. ● Queries can output arbitrary numbers of columns and normalised data types, and can return raw UMLS codes instead of text. Hence our approach can turn free-text into structured data directly.
Cons ● The power mentioned in the Pros section has the downside of requiring a user to familiarise themselves with the Gremlin query language, and the structure of the UMLS Semantic Network. This makes query development much slower and more iterative than more user-friendly search-box solutions. Hence our decision to develop the Graph navigation on the front-end.
Beyond the EUvsVirus hackathon, and with the aid of grants or private funding, we aim to further develop the platform to include:
- Focused Crawler - Ethical crawling to identify and extract relevant information to support the Knowledge Graph, such as best practices, guidelines, or internal documents such as electronic health records and new research documents.
- Document Quality Score - Using a Machine Learning based approach we create a Validity Of Assertion score for each document, that can be used to rank the searches.
- Commentary For Newsletters - Include professional commentary around any trends in emerging research to aid the knowledge development of clinicians.
- Contact Tracing - Using demographic data and discoveries in the dataset for particular underlying health problems, we can build location information to aid key workers and clinicians in outreach prevention work.
3. What you have done during the weekend We have used the fantastic support from skilled mentors and professionals during the EUvsVirus hackathon to further validate our Minimum Viable Product (MVP). Various calls have taken place, and the following hackathon contributors have given feedback:
● Marijndn - Medical Student, provided resources to use for guidelines and best practices across Europe, including Uptodate. ● Fotis Psompoulos - Mentor, has reviewed the platform for commercial use cases relating to the Pharmaceutical industry. ● Antonella Bongiovani - Skilled Mentor, has provided ideas around new datasets from Pubmed and Clinicaltrials.gov ● Hermann Mucke - Skilled Mentor, has provided feedback on the dataset, and suggested partnership opportunities, and further use cases ● Krzysztof Witkowski - Mentor, has provided commercial plan and go-to-market ideas ● Rohit Ail - Skilled Mentor, gave feedback on the business model and use case for Clinical Decision Support ● Dr Hugo Ferreira - Team Mentor, has provided feedback on platform as well as commercial ideas
Others have been in contact and provided feedback over Slack. This resource has been exceptionally valuable to our development, and in fact highlighted a number of commercial applications for a similar product that the business Deeper Insights could pursue.
A large proportion of the Knowledge Graph and Platform development was undertaken four weeks prior to the hackathon, and in line with a Kaggle competition submission deadline for CORD-19. As well as with previous discussions with both doctors, the BMJ and Amazon who have provided credits for development.
However, during this weekend - on the product specifically - we have worked on completing queries to the Knowledge Graph, that can be used to navigate the Graph user interface (UI) by the following topics:
- Background of COVID-19
- Prevention (Treatments, Drugs)
- Assessments & Diagnosis
- Special Situations
We have also worked on completing the user experience (UX), including the build and deployment of our email newsletter service.
Finally, we’ve tested on mobile, and improved the Graph navigation.
4. The solution’s impact to the crisis The COVID19 Insights Platform will have an immediate and profound effect on all clinicians working to fight the COIVD-19 pandemic. The positive impacts are obvious, we expect that less lives will be lost with medics following the latest and most reliable evidence. We also expect that lives would be saved and hospital admissions reduced by medics giving preemptive advice based upon the latest research. Within the next two weeks we aim to provide a publicly available research platform for clinicians working with COVID-19 to access online for free, and distributed to c.80k doctors via Hospify and Pando's apps within the UK. This will, very soon, give clinicians the much-needed support in clinical decision support.
It is important to also consider the negative impacts for this project. In doing so we have considered a societal impact to job security by giving more reliance on machines to perform human tasks, albeit a near-impossible task for a human to do with great accuracy. We have also considered environmental impacts, through the use of more energy-hungry servers. However, this has been mitigated through the design and our selection of server location and provider.
5. The necessities in order to continue the project The platform is very close to completion, and we expect that off our own steam, we can launch within the next 1 to 2 weeks. Following the MVP launch, we hope to improve features and usability in line with our roadmap (see Q3).
What we would like from the community following this hackathon is as follows:-
- Growth funding (c.£350k) to support further development of the solution both for COVID-19 and commercially for other medical research fields, such as Cancer or Diabetes. In order to grow engineering resource, as well as commercial sales & marketing headcount.
- Professional advice in further development of the platform to meet medical guidelines for official use within hospitals.
- Distribution to clinicians via hospitals globally.
- More data!
6. The value of your solution(s) after the crisis During the Hackathon we’ve sought to validate commercial viability for such a platform. The feedback from professionals in this community has been very encouraging, as well as further conversations prior to the hackathon with the likes of Breast Cancer Now and Prostate Cancer UK.
The intention for the COVID19 Insights Platform is that we use grant funding (InnovateUK, H2020, NIHR and others) to improve the UX, the data and technology. All the while helping to support staff on the front-line fighting the virus. Following the successful launch and uptake of this free platform, we aim to commercialise the platform targeting the clinical diagnostics market (globally $50Bn) and clinical research market ($69.8Bn) via one of three business models:
Freemium (Spotify model): Access to the platform for search and discovery of valuable research papers. Links to the research papers to read in full meets a user with a paywall/subscription. A revenue share or commission is agreed with the research institute/copyright owner everytime a document is read.
Platform License (Healthcare): a. Targeting publishers or aggregators, such as BMJ and Elsevier to host their data within such a platform to aid their customers search of their valuable datasets. b. Research institutes, such as Prostate Cancer UK (in early discussion) or Breast Cancer Now (client of Deeper Insights for a different product), could use the platform with their own data to unearth valuable insights to aid their research communities
Platform License (Non-Healthcare): Other industries with large datasets could similarly use the platform to serve as a knowledge discovery tool for insights.
The URL to the prototype: https://covid.deeperinsights.com The URL to the pitch video (Required)