Canary — Climate Health Risk Atlas

Inspiration

The name Canary comes from an old mining practice: before modern gas detectors existed, miners carried canaries into coal shafts. If the bird stopped singing, it was time to evacuate. The canary had no choice — it absorbed the harm that others produced, and its suffering was the warning signal.

That image describes something still happening today, in slow motion and in plain sight. Certain communities in the United States are already living in the equivalent of that coal shaft — breathing air thick with PM2.5 particulate, cut off from healthcare, and bearing the compounded weight of poverty — while the systems that caused those conditions remain largely unchanged. These communities did not choose this exposure. They inherited it through decades of exclusionary zoning, discriminatory housing policy, unequal pollution enforcement, and deliberate disinvestment.

The central question behind Canary is not a technical one: it is a question of justice. Who bears the health consequences of industrialization, and why does that burden fall so consistently on the same communities? We built Canary because we believed that making that pattern visible — measurable, county by county, in real time — was a necessary first step toward making it legible to the people with the power to change it.

What It Does

Canary is an interactive county-level climate health risk atlas covering all ~3,100 US counties. It maps three composite disease risk indices — Cancer Risk, Neurological Risk, and AMR Vulnerability — each derived from real epidemiological and environmental data. But beyond visualization, two features define what Canary is actually for:

Environmental Justice Index — each county receives a structural burden score built as a weighted composite:

$$\text{EJ Score} = 0.35 \cdot \text{PM}_{2.5}^{\text{norm}} + 0.40 \cdot \text{Poverty}^{\text{norm}} + 0.25 \cdot (1 - \text{Access}^{\text{norm}})$$

This captures cumulative impact — the bioethics concept that no single stressor fully explains a community's health burden; it is the accumulation of pollution, poverty, and access deprivation together that creates the patterns we see. Counties are classified into Severe, High, Moderate, and Low burden tiers. A Double Burden view highlights communities where structural inequity and disease risk are simultaneously elevated — the counties with the most urgent ethical claim on intervention.

Policy Intervention Simulator — users can apply evidence-based policy presets (Clean Air Act expansion, Medicaid expansion, anti-poverty investment, or a combined EJ package) and instantly see projected risk reductions across all counties, including the top beneficiary communities by name. The live scenario scoring model is:

$$\text{Score} = \text{base} + w_{\text{PM}{2.5}} \cdot \Delta\text{PM}{2.5} + w_{\text{poverty}} \cdot \Delta\text{poverty} + w_{\text{access}} \cdot \Delta\text{access}$$

This shifts Canary from a tool that documents injustice to one that points toward remedy — which we believe is where the ethical responsibility of a tool like this actually lies.

How We Built It

We pulled from three public sources: CDC PLACES (2022–2023 county-level crude prevalence for smoking, obesity, stroke, diabetes, COPD, depression, and uninsured rates), EPA AQS annual mean PM2.5 readings aggregated by county FIPS, and ACS poverty and healthcare access estimates. Raw values are normalized to $[0, 1]$ using 5th–95th percentile clipping:

$$x_{\text{norm}} = \frac{\text{clip}(x,\, p_{0.05},\, p_{0.95}) - p_{0.05}}{p_{0.95} - p_{0.05}}$$

Sensitivity weights in the scoring model are derived from OLS regression on PLACES prevalence data. A FastAPI backend serves enriched county GeoJSON, and the frontend is built in React + MapLibre GL — a two-level map (state choropleth overview → county drill-down on click) with live scenario sliders that repaint the choropleth client-side without a backend round-trip.

Challenges We Ran Into

The language problem was the hardest challenge we faced — and it wasn't technical.

A map that shows "high-risk counties" can easily be misread as a statement about the people who live there — their choices, their behaviors, their inherent vulnerability. That reading is not only factually wrong; it is dangerous. It inverts the actual causation and places moral responsibility on communities that are victims of structural failures, not authors of them. We rewrote every label, tooltip, callout, and disclosure multiple times to hold a consistent framing: risk scores reflect the condition of systems, not the character of communities. The persistent ethics banner, the full data disclosure modal, and the "Structural Note" in every county detail panel are all deliberate design responses to this challenge. Bioethically, this is not cosmetic — it is the difference between a tool that supports advocacy and one that enables stigma.

PM2.5 coverage gaps presented both a technical and an ethical problem. EPA monitoring stations are not evenly distributed — rural counties, often those with the highest poverty burden, frequently have no monitoring data at all. We filled missing values with the national median, which likely understates pollution burden in agricultural and industrial rural areas. This means Canary probably underestimates harm in some of the most disadvantaged communities, and we document this explicitly in our ethics disclosure.

FIPS alignment across three datasets that each encode county identifiers differently — CDC's LocationID, EPA's separate state and county columns, and the Census GEO_ID format (0500000US06087) — required a five-format normalizer and fallback chain before every county resolved correctly.

Accomplishments That We're Proud Of

We're most proud of the ethics disclosure — not because it was technically difficult, but because it represents a genuine commitment to building responsibly. It covers data provenance, model limitations, the structural vs. individual framing distinction, privacy, and an explicit prohibition against using risk scores to deny care or resources to high-burden communities. That last point matters deeply to us: a risk atlas should direct resources toward high-burden communities, never away from them. The ethical obligation runs in exactly one direction.

We're also proud of the Double Burden view — the counties where structural inequity and disease risk are simultaneously elevated. These are communities facing compounding, intersecting harms with the fewest resources to cope. Naming them feels like the most honest thing Canary does.

We're proud that the Policy Simulator creates agency, not just alarm. Showing that a Medicaid expansion would reduce average risk across hundreds of counties — and naming those counties — gives advocates and policymakers something to act on, not just something to observe.

What We Learned

Inequity is not random — it is geographic and historical. When you look at the double-burden map, the clusters are not surprising if you know US history. They trace redlined neighborhoods, industrial corridors, and the precise outlines of states that declined Medicaid expansion. The data does not create this pattern; it reveals one that was already encoded into the landscape over decades of policy choices.

Transparency is not a footnote — it is a design requirement. A tool that hides its assumptions, its data gaps, and its potential for misuse can cause harm regardless of intent. Building the ethics disclosure into the persistent UI — not buried in a settings page — was one of the most important decisions we made.

Bioethics is upstream of the code. The most consequential choices in Canary were not about algorithms or architecture. They were about whose burden gets counted, how results get framed, and who the tool is ultimately accountable to. Those decisions shaped everything else.

What's Next for Canary

The most important next step is not a model improvement. It is community validation. Risk scores derived from administrative data cannot substitute for the lived experience of the communities they describe. We want to partner with environmental justice organizations to ground-truth our indices, surface local context that federal datasets miss, and ensure Canary is built with affected communities rather than simply about them. That is not a feature — it is an ethical obligation.

On the technical side, we want to replace OLS weights with cross-validated regression on health outcome data (mortality rates, years of life lost), add a temporal trend view showing whether a county's burden is improving or worsening year over year, and link flagged counties to existing EJScreen and state-level environmental justice reports. A model no one can explain is a model no one can challenge — and a model that cannot be challenged cannot be trusted.

Built at BioHacks 2025 · Data: CDC PLACES, EPA AQS, US Census ACS · Stack: FastAPI, React, MapLibre GL, Python/Pandas