We saw that a lot of the news stories and media we consumed were over-sensational and often straight-up false. Especially with issues as sensitive as global pandemics, we needed to find a way to restore trust to our sources of information.
What it does
We run a semantic search upon a provided website/text against a database of thousands of Coronavirus related scientific studies, papers, and articles (the CORD-19 dataset). We find the most relevant studies and present them to you directly so you can find out if the media you consume is truly worth reading.
How we built it
We used a Python backend which uses a Google Cloud Platform's Big Query API (which hosts our SQL-queriable dataset) to pull studies and their data. Then, we implemented a semantic search algorithm to find the most relevant studies to the source text. We interfaced it with a website that calls a script that can quickly pull data from any url. Then, our backend sources the top 3 most relevant scientific articles and studies (see video above).
Challenges we ran into
We initially tried to build a chrome extension, but had a hard time creating an interface with our Python backend. Additionally JQuery didn't support a lot of the features we were looking for, but we found that out too late. We crated a temporary solution using AutoHotKey, and later replaced it.
Accomplishments that we are proud of
We managed to host a frontend that we had a hard time integrating into the rest of our platform. Additionally, we managed to reduce the running time of our semantic search from ~90s to a 15 second range (which we plan to improve on later).
What we learned
It was our first time using Google Cloud API, and we learnt a lot from it and how easily it can be integrated. However, we discovered this too late, so we couldn't use its full potential.
What's next for COVInfo
Next, we want to host our site on domain.com and have a could-based backend fully integrated with Google Cloud. This will help us improve our query speeds. We also want to implement indexing to improve efficiency. Our team sees this tool also being used as a chrome extension, which we will continue to look into. The algorithms and tools we used are very versatile, so we could continue finding databases of studies (possibly working with the major journals themselves), and apply these solutions to create a more trustable post-COVID world.