Inspiration

Recently the Allen Institute for AI released a data set consisting of important research papers related to the current Coranavirus crisis (COVID-19):

https://pages.semanticscholar.org/coronavirus-research

This rich data set, known as the CORD-19 data set, inspired the creation of the Covid-19 Explorer app, as it has surfaced the need for interactive tools to help people analyze & understand the data that is in unstructured text form within the article set.

What it does

Covid Explorer is an Alexa Skill that leverages the Amazon Sagemaker environment and several associated services in order to create an interactive research & exploration tool for data scientists, medical personnel, and other researchers to use. 1,000 documents in the CORD-19 data set were run through the Amazon Comprehend Medical service and the 7Park Drug Name Entity Recognizer service to create the richly connected data that underpins the app.

Amazon Comprehend Medical is Amazon's powerful text analyzer service that has extensive capability when it comes to recognizing salient, medically related terms and phrases. Since drug names are so critically important to data scientists and researchers, the 7Park Drug Name Entity recognizer service was added to round out the text analysis suite and make sure nothing important was missed.

All the terms extracted by Covid-19 Explorer are then tagged with the source paragraph they were found in. This is very important. Covid-19 Explorer deep links at the paragraph level, not at the document level like most applications, raising radically the probability you will find useful information directly related to your current focus.

The extracted terms are processed into a rich, connected graph, that places concept clusters that exhibit a high degree of semantic similarity next to each other. This make exploration of the graph an intuitive and productive experience for researchers of the Coronavirus. Lastly, an easy to use and unique, interactive visual interface turns the dry task of analyzing scientific documents into an interesting and creative journey through a topically related landscape of medical and non-medical terms.

Infrastructure and Ancillary Services

  • AWS Sagemaker was the primary engine for running the Python code that processed the text with Amazon Comprehend Medical and the 7Park Drug Name Entity Recognizer services. It was also used with Python to create the concept cluster graph.

  • DynamoDB is used to hold all of the data that is involved with the project and was integral during the data creation phase of the project.

  • AWS Lambda runs the back-end Node.JS interface that services Covid-19 Explorer, since it an Alexa Skill based on the Web Games API that requires a flexible and scalable back-end.

  • S3 was used to hold the CORD-19 article set and other critical files that supported the data creation phase of the project. It is just as important now since it currently supports the Alexa skill's operations and interface by providing a large number of files that the app provides to the user when requested.

Challenges I ran into

The usual. Never enough time, or sleep, or coffee.

Accomplishments that I'm proud of

I believe strongly that the Alexa Skill platform is the best platform available for delivering a synergistic user interface experience that combines, voice, display, and WebGL 3D elements. I am most proud that I was able to seamlessly integrate the machine learning marketplace aspects of the application with my favorite development platform.

What's next for Covid Explorer

The next step is to seek funding so we can extend the range of the app to the entire CORD-19 article set. Also, we look forward to adding new text parsing & analytic methods to enhance the semantic power of the app and unearth even more vital and helpful semantic connections.

Share this project:

Updates