Inspiration

During the record-breaking wildfire season of 2023, we noticed a critical gap: official apps often showed “Good” air quality in areas visibly covered in smoke.

The issue is structural. The EPA monitoring network is spatially limited—averaging roughly one PM2.5 sensor per 3,800 square miles.

This raised a key question:
What is happening in the gaps between sensors?

We set out to find what we call “Ghost Air”—dangerous air quality conditions that exist but are not directly measured.


What it does

Ghost Air is a spatial analysis engine that identifies air quality monitoring blind spots.

It works by:

  • Analyzing historical EPA data
  • Comparing neighboring monitoring stations
  • Detecting “mismatches” where nearby stations report drastically different air quality

These inconsistencies signal unmonitored high-risk zones, providing a more accurate localized assessment than relying on a single nearest station.

Output: A ranked list of high-risk zones with associated confidence scores, enabling targeted decision-making.


How we built it

Built entirely on Zerve, our pipeline:

  • Ingests 215,732 EPA records
  • Processes 991 active monitoring stations
  • Performs pairwise spatial analysis across 14,483 station combinations

Core technologies include:

  • pandas for data processing
  • geopy for spatial distance calculations
  • scipy for statistical validation

Using this approach, we identified:

  • 24 confirmed “Ghost Air” zones from Summer 2023

The system is structured as modular Python components, designed for:

  • Reusability
  • Scalability
  • Future real-time integration

The Impact

Our analysis uncovered 24 blind spots affecting thousands of residents.

Albany Case Study

  • A 130-mile monitoring gap left the Adirondack region without warnings
  • Neighboring stations showed a 155-point AQI discrepancy

By quantifying these gaps, Ghost Air provides actionable insight for:

  • School districts
  • Public health agencies
  • Emergency response planners

This moves air quality assessment beyond “nearest sensor” assumptions.


Challenges we ran into

  • Processing large-scale pairwise spatial computations efficiently within the Zerve environment
  • Ensuring scientific defensibility of results

We focused on:

  • Highlighting network limitations, not sensor errors
  • Maintaining rigorous spatial logic and reproducibility

Accomplishments that we're proud of

  • Identified and validated 24 high-risk blind spots using only raw EPA data
  • Built a fully transparent, end-to-end pipeline within Zerve
  • Transformed raw datasets into a mathematically validated risk model

What we learned

  • The “nearest station” model breaks down during wildfire events
  • Spatial gaps can significantly distort perceived air quality
  • Zerve enables efficient handling of complex spatial analysis workflows in a unified environment

What's next for Ghost Air

Phase 1 — Live API

Deploy the analysis engine as a FastAPI endpoint for real-time access

Phase 2 — Interactive Frontend

Build a Streamlit-based Risk Map for public exploration

Phase 3 — NOAA Integration

Incorporate wind transport data to:

  • Validate smoke plume movement
  • Improve prediction accuracy

Built With

Share this project:

Updates