Inspiration

My original motivation with this project was to discover how many reposts are on reddit everyday. Then, the Ukraine invasion began. I found myself absorbed by the news surrounding the invasion and ended up seeing lots of videos and posts on reddit relating to the war. I am also someone who is a sucker for a good map, and I always see those cool maps on r/dataisbeautiful and wanted to make my own. Thus I decided to change the focus of my project last minute and focus on trying to graph a representation of cities that are under-represented in the news.

What it does

Graphs the mentions of certain cities on reddit, and the their mentions divided by their total population.

How we built it

What areas of Ukraine are mentioned the most on reddit? For the mentions of Ukraine I used an API called Pushshift to go through the title of every post in the last X amount of time. With each day you go back to pulling data, the amount of time the program takes to run goes up proportionally. By the time I was running through datasets spanning the last couple weeks, the program would take 5+ minutes to complete. Thus, while developing and testing my program I would only pull data from the same day, and then to make the final graphs I scaled it up. What areas of Ukraine are under-mentioned on reddit? I wanted to find a way of finding places that are maybe under-mentioned on reddit, as all of the popularity can be in a few cities when there’s equally troubling things affecting an even larger populous somewhere else in the country. The best way I thought of doing this was to plot the amount of mentions on reddit versus the population of the given city. Where are the places in Ukraine most frequently mentioned on reddit? Basically what I did was go through every reddit post and break the title down word by word. Then I created a dictionary that maps each word in every reddit post title to a number that represents the amount of times that word has been seen. Then for each city in the city list of Ukraine, I found how many times that the city was mentioned and converted its lat/long coordinates to a geopandas file to graph on the map of Ukraine.

Challenges we ran into

The impact of my results are likely very little, but potentially a good starting point to diving more in-depth on what is going on in Ukraine. For example, maybe you want to know the regions where the most people are getting the most coverage, you could look for green elements on the # mentions per person graph. There are a few pretty big limitations and/or biases on these maps. The first is that reddit is not the only source of news, and it is definitely not unbiased. The second, is that only cities with at least one mention are listed, meaning there may be some farm town being bombed, but because there is absolutely no reference to it, it never even shows up as an element on the map. The third bias is that I am looking at a restricted time frame (10 days back), and the war is constantly on the move so the data is changing constantly. Another bias is that all towns with less than 1000 people are not considered (this was a limitation of the dataset I was using), so the small towns affected by the war are ignored. I think this reason is why you have large predominantly farm country parts of the country with no triangles, because the towns are all too small.

Accomplishments that we're proud of

I think the graphs are pretty, and I have always wanted to pull, parse, and visualize "live" data.

What we learned

I learned that these relatively simple graphs can take a really long time to make, especially when you have to learn an entire API to do it.

What's next for Ukraine Crisis’ Reddit Popularity Visualized

I would love for the graph to be more interactive so that it was easier to see what cities were being graphed and more stats about them.

Built With

Share this project:

Updates