Inspiration: Our inspiration is sports analytics, which is helping to increase goal-directed performance and engagement across sports and around the world. Air pollution is the fourth highest risk factor for death globally and by far the leading environmental risk factor for disease. Air pollution regulation is also a key factor in attaining eleven or more sustainability development goals. Because of a triple jeopardy effect, populations of low socioeconomic status are exposed to higher levels of air pollution because of where they live or work, leading to increased susceptibility to poor health, resulting in further health-related disparities that are driven by environmental factors. By comparing teams of elected officials in Chicago, Colorado Springs, Denver, Milwaukee, Oakland, Phoenix, Pittsburgh, Richmond, and Whatcom areas, we wanted to find out whether elected officials are giving the topic of air quality attention it deserves. If we focus on goals, we can increase performance and make meaningful change. Are our elected officials focused on important goals like our most-admired athletes?
What it does: Our project takes large datasets associated with local politics and measures topic frequency across issues to develop a measure for how frequently the topic of air quality is raised by elected officials compared to other topics raised. We then juxtapose this measure of topic frequency for teams of elected officials in Chicago, Colorado Springs, Denver, Milwaukee, Oakland, Phoenix, Pittsburgh, Richmond, and Whatcom with air quality data for those same local areas to see if we can find some correlations between what elected officials talk about and what is happening with the air around them.
How we built it: Using Python scripts with Selenium, we were able to automate an HTML scraping process. Using this process, we collected council meeting data for Chicago, Colorado Springs, Denver, Milwaukee, Oakland, Phoenix, Pittsburgh, Richmond, and Whatcom. In that data, we searched for common air quality terms in the discourse of elected officials and quantified the results. We then removed the air quality search terms, and ran the last 5000 general meetings through a topic classification learning model so that we could normalize air quality discourse against other topics that were generally discussed by the elected officials. Then we used the OpenAQ API to get a proxy for each area’s air quality using fine particulate matter (PM2.5) measurements. PM2.5 is a commonly measured air pollutant that is a concern for health. PM2.5 is not the only damaging pollutant in bad air, but it is one of the major ones. Once we had all this data, we were able to create a Python Notebook which does both graphical and numerical analysis while also displaying results clearly and making them easy to understand.
Challenges we ran into: All the areas we originally wanted to include didn’t share similar platforms for recording their meetings, so we had to remove some areas to increase efficiency. Furthermore, the span of data for each area wasn’t the same, with some areas having data going back 15+ years and some having only the last 5 years. We narrowed down the areas that had the closest range and did some further data cleaning to provide comparable results between a set of areas smaller than the number we initially wanted to consider. There may be jurisdictional challenges as well: some areas may not be able to control some key local factors in air quality. As a result, local councils may feel reluctant to discuss measures that could influence air quality.
Accomplishments that we're proud of: The scraper we created is generic enough that it can be applied to similar platforms with minimal changes. We can also change what topic we are interested in. This allows us to apply this same process to other topics of interest with very few changes. We were also able to create easy-to-understand graphs, that allow the user to clearly see results and analyze them.
What we learned: From data that we were able to collect, we learned that there is not a strong relationship between the attention air quality is given by teams of elected officials in Chicago, Colorado Springs, Denver, Milwaukee, Oakland, Phoenix, Pittsburgh, Richmond, and Whatcom and the actual air quality of those locales. Other topics that have a lesser impact on human health were discussed much more by local politicians. This may be because of jurisdictional issues or some other causes. We would need to spend more time with more areas, more data, and have a bigger analytics budget to dig deeper. Analytics is like that: it took many years and many minds to develop the structured quantitative analytics that is commonly used in sports today to increase performance as well as fan engagement.
What's next for Goverlytics: We are clearly onto something and have garnered some international recognition for it. Next, we need to get investment to push the concept and productize. We’re looking for investors that understand the Environmental, Social, Governance ("ESG") movement and want to contribute to making global change happen now by applying positive lessons from sports analytics to politics.
Log in or sign up for Devpost to join the conversation.