http://www.airqualitydatascience.squarespace.com seeks to help increase the use of alternative modes of transportation for work. Our website displays the impact of various modes of transportation on air quality, specifically ozone levels. Using the data provided by the CDC Tracking Network API, we discovered the most important modes of transportation that explained our model and the impact that each of the modes of transportation had on ozone. We demonstrated how correlated certain methods of transportation are to changes in ozone levels. This will help these public and semi- public organizations determine how to change public policy and demand to affect commuter behavior.

alt text


Air quality is a major issue per the EH-1 and adopting alternative methods of transportation is the issue in EH-2 (https://www.healthypeople.gov/2020/topics-objectives/topic/environmental-health/objectives). We decided to create a website that would educate users and public policy makers about the impact various methods of transportation have on air quality, specifically ozone.

What it does

Using the data provided by the CDC Tracking Network API, airqualitydatascience.squarespace.com aims to educate users on what attributes deeply affect ozone level and learn more about adopting alternative modes of transportation for work that help lower ozone levels. What we discovered is that Carpool, Drive Alone, Mass Transit, and Bus were the strongest factors that affect ozone in the United States. We built a dynamic map visualization of our ozone data that illustrates how ozone conditions changed on a county level from 1996 to 2014. We illustrate the non-linear relationship of the features using the decision trees and sunburst visualization.


  1. Start off exploring the ozone map. This map provides insight into where ozone levels are highest and how frequently a county has ozone levels above regulatory standards. We also included a per-capita map that displays all features by county. It defaults to the carpool variable.
  2. Understand how we identified the most explanatory features of our model to determine which methods of transportation correlate with ozone trends.
  3. Observe the impact of the significant methods of transportation and understand its relationship with ozone levels.
  4. Explore the interactive charts that display commonalities and differences between various methods of transportation.
  5. Dive into the detailed notebook to see our approach into analyzing the air quality and transportation data provided by the CDC tracking network API.

How we built it

We collected the transportation, population, and ozone data from CDC Tracking Network API. We cleaned the data, added additional features such as per-capita features, and built the model on jupyter using python and pandas. We also used CartoDB to visualize our ozone data. We displayed the logic of how RandomForestRegressor selected the best feature with BigML. We provide a visual data exploration tool for the user to dive into the dataset. One of the tools is made with Highcharts (a javascript tool) as well as a per-capita map (made with CartoDB), that shows all the features on a per-capita and county basis. Lastly, we built the site on Squarespace and added some custom CSS and code blocks.

alt text

Challenges we ran into

Some of the transportation data was missing for several years. The population data was complete for the most part with the exception of smaller counties. The ozone data was also incomplete for some counties as well. We filled some of the null values with the average of each feature by county in order to have consistent values that protects the integrity of our data. There was also overlapping years in our transportation data. We split the four year average across each year and for the years where the data overlapped, we applied the mean on all overlapping years by county.

Accomplishments that we're proud of

We created a beautiful ozone map that lays out how large cities contribute to higher levels of ozone, especially those with higher concentration of drivers (e.g. California). We also built an explanatory model of our Random Forest Regressor that shows how we selected the top features to display in our model.

alt text

What we learned

We learned the carpooling was highly correlated to ozone levels and shows a positive directionality, meaning carpoolers contribute to a significant portion of ozone levels. We also learned that taking public transportation, riding the bus, and working from home were also highly correlated to ozone levels, but showed negative directionality meaning that these methods actually reduce lower atmospheric ozone levels. Lastly, we also learned that there's a non-linear relationship between all the features because we have to consider them together to comprehend ozone levels on a county basis.

What's next for Air Quality Data Science

It was surprising to use that carpooling had such a high impact on our environment. Our next steps would be to dive into features of drivers that made carpool so negative correlated. Are more people driving gas-powered vehicles? Do larger vehicles such as trucks and SUV contribute to more ozone than smaller vehicles? Can we learn more the relationship between gas prices and numbers of drivers? How many electric vehicles are on the road? How can we encourage more drivers to switch to electric? These are all questions we would love to explore in our next iterations.


Sri Kanajan, Data Scientist Email: kanajan.sri (at) gmail.com

Derek Ku, Acquisition Associate Manager Email: derektku (at) gmail.com

+ 3 more
Share this project: