Inspiration

With so much information spread around so many sources, people need a straightforward and quick way to make sense of what is happening around the pandemic. Most people follow the news on social media and are subject to only a handful of news sources. Simultaneously, most people don’t have the knowledge or the skills to identify falsehoods when these are presented in abstract. You can watch the pitch video here.

The idea behind this project is that when presented with multiple sources providing information on a matter, it is easier for the reader to use that diversity to analyze critically a subject.

What it does

COVID Info Watch scrapes thousands of news articles published daily and clusters them on common themes through statistical analysis, allowing users to better discover relevant information, explore it and relate it to wha is being mentioned in the media.

How I built it

The backend system is powered by an existing aggregator I’ve built and maintain (https://thoro.news) and is composed of: 1) a scraping service that gathers articles from 270 news sources; 2) a data crunching service, responsible for performing statistical analysis of bi-grams in article’s headlines and body content. It runs through up to 5000 articles and groups each depending on their bi-gram similitudes; It it built using Javascript (VueJS on the Frontend, NodeJS on the Backend) and MongoDB for the database.

Challenges I ran into

The biggest challenge was to conceptualize the whole interface to make use of the information I possess and focus its use on the pandemic: how to explore clusters, how to find similar articles around a themes and changing the backend accordingly to accommodate these features.

Accomplishments that I'm proud of

During the weekend I focused mainly on developing the web portal to be fully usable by anyone at the end of the challenge and I believe I managed to have a very usable product that serves it’s initial purpose. I’ve received positive feedback from mentors during the hackathon.

What's next for Covid Info Watch

There are many ways the project can evolve: more sources must be added to diversify even more the clusters. More computing resources will be needed both in terms of storage space and CPU / memory as new sources are added. It would also be ideal to have a data scientist to assist in the refinement of the algorithm to turn it more efficient in it’s aggregation. For now I hope to divulge the portal around the web, gather user feedback and improve on it so that it can truly be useful to society.

Built With

Share this project:

Updates