It’s Not Easy Being Wheezy!

Inspiration

Air pollution is a well-documented health risk, with especially strong impacts on the 28 million Americans living with asthma. Many people rely on the Air Quality Index (AQI) to understand daily risk, but standard AQI only reflects the worst pollutant and misses combined multi-pollutant exposure effects. Understanding which pollutants have the strongest impact on emergency room visits and whether this impact varies by age requires rigorous data analysis. Here, we move beyond correlation and build predictive models that quantify these relationships. Using our interactive GUI, California users can better understand county-level asthma risk, enabling more informed public health decisions.

What it does

We analyzed EPA air quality data alongside California's asthma emergency department (ED) visit data from 2015 to 2022. While AQI summarizes air quality using a single worst-pollutant index, our approach models individual pollutants simultaneously to capture multi-exposure effects. We created predictive models that identify which pollutants are most strongly associated with asthma ED visit rates and used multivariate models to evaluate how these effects vary between children and adults across California.

How we built it

Exploratory Data Analysis: Started with 16 air quality parameters but discovered limited sample sizes. Focused our analysis on pollutants with sufficient observations.

Single Pollutant Models: Ran univariate linear regressors for each pollutant, retaining those with a p-value < 0.05 and adequate sample sizes.

Multivariate Linear Regression: Built a multivariate linear regression model using ozone, nitric oxide (NO), and nitrogen dioxide (NO₂) as primary predictors, with age group as a moderator. Compared raw versus within-county demeaned inputs to distinguish absolute pollution exposure effects from local variations. The model explains 53% of variance in asthma-related ED visits (R² = 0.532).

Validation of Multivariate Regression: 5-fold cross-validation showed robust generalization with minimal overfitting (mean R² = 0.512 ± 0.053 across folds; training R² = 0.532 vs. CV R² = 0.512).

Multivariate Machine Learning (XGBoost): Trained an XGBoost model to capture nonlinear interactions among key pollutants. Performance was evaluated using R² (0.3149), feature importance, and SHAP (SHapley Additive exPlanations). While XGBoost captured nonlinear structure, the multivariate regression model provided comparable performance and was selected as the final model due to its interpretability.

Challenges we ran into

A major challenge was aligning datasets across different sources, timescales, and reporting structures. Across all datasets, there was substantial missing data, which impacted the power and direction of our statistical analyses.

Accomplishments that we're proud of

We successfully developed a multivariate regression model that captures multi-pollutant exposure and cross-validated our model performance. We also translated our findings into an intuitive dashboard designed to make complex air quality and health impacts accessible to individuals with asthma.

What we learned

We learned about AQI and how different pollutants are measured by the EPA. We also gained experience working with large public health datasets, running different types of regression, and creating a user-friendly visualization tool.

What's next for It’s Not Easy Being Wheezy!

In the short term, we will investigate collinearity between nitrogen oxide species (NO, NO₂, NOy) to improve model stability. Beyond this, we aim to extend our work into a real-time risk prediction tool by integrating with at-home air monitoring systems like PurpleAir.

APA References

California Health and Human Services Agency. (n.d.). Asthma emergency department visit rates [Dataset]. Retrieved from https://data.chhs.ca.gov/dataset/asthma-emergency-department-visit-rates

U.S. Environmental Protection Agency. (n.d.). Air quality system (AQS): Annual AQI quality data [Dataset]. Retrieved from https://aqs.epa.gov/aqsweb/airdata/download_files.html

Built With

Share this project:

Updates