Many people turn to crazy conspiracy theories as a method of coping with events they can't control or don't understand. With this in mind, we wanted to find a way to represent both the spread of COVID-19 as well as the spread of fake news and misinformation surrounding the virus.
What it does
This python project takes in a massive amount of data from different coronavirus tracking projects to display a heat map of the US over time. We then paired two line graphs with this heat map. The left line graph shows the number of occurrences of misinformation in a given state on a given day. The right line graph shows the number of confirmed COVID-19 cases in the selected state.
How I built it
This was all built using a python library called Plotly which is used to generate maps and for aligning data. Our data was sources mainly from https://covidtracking.com/api/states/daily which is a great project that has stored outbreak data on a state to state basis since the beginning of January. For the false information statistics, we used a large dataset of tweets all related to COVID-19. With this set of around 5 million tweets, we ran it against a filter to check for any number of fake news keywords and common conspiracy theory wordings. After filtering that out, we needed to sort the misinformation-correlated tweets with the location of the user who posted it, and get a rough estimate at the amount of misinformation at any given time in a state.
Challenges I ran into
The hardest part of this challenge was finding a way to parse the massive amount of data returned by the covidtracking API. After finding an efficient way to parse the data by day and then feeding it into a heatmap. I had no experience with the Plotly Python library and until I found it I was completely lost with how to make a full map of the US. Another challenging part was managing the large dataset of tweets. It is hard to work around memory limitations when working with a large dataset.
Accomplishments that I'm proud of
We're happy with our ability to stay flexible. We really had to go with the flow on this project because we didn't have a full plan when we started. Not knowing what language/framework was going to work best for us really made it difficult to plan our time. But we rolled with the punches and all the setbacks and produced something that we think displays important statistics that most maps gloss over.
What I learned
From a technical standpoint, we learned about several different methods for parsing and handling datasets that are in the millions of entries. From an COVID-19 standpoint, we learned about the ways in which these conspiracy theory coping mechanisms work. In states such as California and Washington, there is a clear connection between the spreading of the disease in that area and the amount of fake news/misinformation about it.
What's next for MisinformationMap
We plan to port this to html and upload it to our personal website for everyone to check out and understand.