MisinformationMap

California's COVID-19 cases compared to the amount of misinformation regarding the virus

Inspiration

Many people turn to crazy conspiracy theories as a method of coping with events they can't control or don't understand. With this in mind, we wanted to find a way to represent both the spread of COVID-19 as well as the spread of fake news and misinformation surrounding the virus.

What it does

This python project takes in a massive amount of data from different coronavirus tracking projects to display a heat map of the US over time. We then paired two line graphs with this heat map. The left line graph shows the number of occurrences of misinformation in a given state on a given day. The right line graph shows the number of confirmed COVID-19 cases in the selected state.

How I built it

This was all built using a python library called Plotly which is used to generate maps and for aligning data. Our data was sources mainly from https://covidtracking.com/api/states/daily which is a great project that has stored outbreak data on a state to state basis since the beginning of January. For the false information statistics, we used a large dataset of tweets all related to COVID-19. With this set of around 5 million tweets, we ran it against a filter to check for any number of fake news keywords and common conspiracy theory wordings. After filtering that out, we needed to sort the misinformation-correlated tweets with the location of the user who posted it, and get a rough estimate at the amount of misinformation at any given time in a state.

Challenges I ran into

The hardest part of this challenge was finding a way to parse the massive amount of data returned by the covidtracking API. After finding an efficient way to parse the data by day and then feeding it into a heatmap. I had no experience with the Plotly Python library and until I found it I was completely lost with how to make a full map of the US. Another challenging part was managing the large dataset of tweets. It is hard to work around memory limitations when working with a large dataset.

Accomplishments that I'm proud of

We're happy with our ability to stay flexible. We really had to go with the flow on this project because we didn't have a full plan when we started. Not knowing what language/framework was going to work best for us really made it difficult to plan our time. But we rolled with the punches and all the setbacks and produced something that we think displays important statistics that most maps gloss over.

What I learned

From a technical standpoint, we learned about several different methods for parsing and handling datasets that are in the millions of entries. From an COVID-19 standpoint, we learned about the ways in which these conspiracy theory coping mechanisms work. In states such as California and Washington, there is a clear connection between the spreading of the disease in that area and the amount of fake news/misinformation about it.

What's next for MisinformationMap

We plan to port this to html and upload it to our personal website for everyone to check out and understand.

Built With

python

Submitted to

hack:now

Created by

I worked on creating the heat map and handling the COVID-19 data.
It was a lot of fun to create such an interesting visual representation of the virus.

David Wolfe
I worked on the scraping of tweets and our misinformation filter. I scraped a database of tweets about COVID-19, then ran a filter to see any that might trigger misinformation and sorted them according to the location of the author and plotted those distributions by state.

Jeremy Ferguson

Updates

David Wolfe started this project — Apr 26, 2020 11:38 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.