Korono: AI-based question-answering platform for COVID-19 papers
We are overwhelmed by the number of documents related to COVID-19. One of the largest such datasets is CORD-19: COVID-19 Open Research Dataset. It is composed of more than 47'000 scholarly articles, 36 thousand of which include full text.
Working with such a large dataset and extracting insights is an open challenge. Korono is an online tool that attempts to answer natural language questions related to the corona disease. The app has a friendly user interface and is simple to use. It has been designed for physicians, virologist, toxicologists and COVID-researchers – no computer science knowledge required.
What it does
A minimal live version is available here: Korono.
How we built it
Korono is composed of two parts: the search engine and the question-answering model. First, given a query
q the search engine returns a list of all relevant papers for that query. Subsequently, a question-answering model is used to extract the answer from each paper. The results, therefore, will not be just a single answer but rather a small collection of document excerpts that may be of relevance.
On Saturday and Sunday, we built the search-engine and the question-answering model. We developed the code on a Kaggle notebook as Kaggle offers free GPU time. A GPU was necessary as the underlying question-answering model, a BERT model fine-tuned on the SQuAD dataset, requires a powerful processing unit.
Challenges we ran into
- Motivation, hard to work from home
What we learned
- Work under pressure
- Teamwork from home
- Live online-version
- Improved frontend
- More precise search engine with bioBERT
- Jonathan Besomi: backend development
- Yann Bolliger: frontend development