Inspiration

San Diego County averages over 266 sunny days per year, yet solar adoption is deeply uneven across its 736 census tracts. The neighborhoods with the greatest need — highest heat exposure, lowest income, worst air quality — are often the same ones with the least solar deployment. This is not just an equity problem. It is an untapped market opportunity hiding in plain sight.

The project started with a simple question: where is solar most needed and least present? The answer required combining four independent datasets across different geographies, time horizons, and measurement frameworks into a single analytically rigorous model.

What it does

The dashboard quantifies the solar deployment gap across all 736 SD County census tracts by computing three scores for each tract:

Need Score — a weighted percentile rank across three vulnerability dimensions:

$$\text{score_need} = w_{\text{heat}} \cdot \text{pct}(\bar{x}{\text{heat}}) + w{\text{svi}} \cdot \text{pct}(\text{SVI}) + w_{\text{ces}} \cdot \text{pct}(\text{CES})$$

Deployment Score — a weighted percentile rank of observed solar activity:

$$\text{score_deployment} = 0.70 \cdot \text{pct}(\text{permits}) + 0.30 \cdot \text{pct}(\text{installer count})$$

Gap Score — the core finding:

$$\text{score_gap} = \text{score_need} - \text{score_deployment}$$

Tracts with $\text{score_gap} \geq 0.40$ are classified Priority. Under the primary Balanced weighting scheme (Heat 40% / SVI 35% / CES 25%), 69 tracts are Priority under all five weight schemes tested — a robust, scheme-independent recommendation cohort representing over 180,000 residents with no meaningful solar footprint from any of the five major installers in the dataset.

The dashboard surfaces this through eight pages: a priority gap map, need landscape, deployment analysis, tract-level drilldown, sensitivity analysis across five weight schemes, hazard layers (seismic PGA, earthquake history, PSPS fire shutoff context), and full methodology documentation.

How we built it

Data pipeline. Five installer permit datasets contained only street addresses — no census tract identifiers. These were geocoded using the Census Bureau Batch Geocoder, pre-filtered to San Diego ZIP prefixes, and cached locally. Match rates exceeded 93% across all installers, yielding 12,615 permits across 518 tracts.

Four independent sources were joined to a 2020 tract:

Dataset Source Coverage
Installer permits Sunrun, Sullivan, Titan, Freedom Forever, SolarCity 518 SD tracts
Extreme heat days Cal-Adapt (2006–2100) 627 tracts (2010 boundaries)
Social Vulnerability Index CDC/ATSDR SVI 2022 736 tracts
CalEnviroScreen 4.0 OEHHA 736 tracts

Scoring. All inputs are converted to within-county percentile ranks before weighting, ensuring no single metric dominates due to scale differences. Five weight schemes spanning the defensible range are computed simultaneously. The priority_agreement column (0–5) counts how many schemes classify each tract as Priority — 69 tracts score 5/5.

Hazard layers. USGS ASCE 7-22 design PGA values were fetched for all 736 tract centroids via the USGS design maps API, giving a seismic hazard score for every tract. 500 historical M3+ earthquake events were pulled from the USGS FDSN catalog and overlaid as a scatter layer on the seismic map.

Application. An 8-page Streamlit app with Plotly choropleth maps on carto-darkmatter tiles, a dark professional theme, and a global weight scheme selector that updates every map and metric simultaneously.

Challenges we ran into

Geocoding at scale. The raw installer files had no tract identifiers — only addresses. Sending 12,000+ addresses through the Census Batch Geocoder in chunks, handling non-exact matches, and building a local cache to avoid redundant API calls required careful pipeline design.

Boundary mismatch. Cal-Adapt heat data uses 2010 tract boundaries; everything else uses 2020. The Census redrew 201 tracts between decennial surveys. A production implementation would use the Census 2010-to-2020 relationship file for population-weighted interpolation — county-mean imputation with an explicit flag was used as a defensible approximation.

Double-counting vulnerability. CalEnviroScreen's Population Characteristics component substantially overlaps SVI Theme 1. Using both at full weight inflates the influence of demographic vulnerability. The Balanced scheme down-weights CES to 0.25 specifically to mitigate this.

Incomplete market coverage. The five installer datasets capture only a portion of the SD solar market. Baker Electric, Sunpower, and numerous local operators are not included. Permit counts are a lower bound on true deployment.

Accomplishments that we're proud of

The sensitivity analysis is the accomplishment we are most proud of. Rather than committing to a single weighting assumption, five schemes spanning the defensible range were tested simultaneously. The robustness cohort is the finding, not the map.

What we learned

Building this project made clear how much analytical credibility depends on being explicit about uncertainty. Showing that a finding holds across all reasonable assumptions is more persuasive than any single optimized result.

We also learned how fragile geospatial joins can be in practice. Boundary mismatches, coordinate system inconsistencies, and API instability across public government data sources required defensive engineering at every step. The geocoding cache, imputation flags, and fallback handlers were all responses to real failures encountered during the build.

What's next for SD Solar Equity Dashboard

LLM-powered tract recommendations. The most impactful near-term addition is a natural language recommendation engine built on top of the tract-level data. A user selects a census tract and specifies a target audience — a solar installer, a grant program administrator, a city council office, or a community organization — and a language model generates a tailored one-page brief explaining why that tract merits attention, what the dominant risk factors are, what program type fits the housing composition, and what the strongest grant framing is given the local SVI, heat, and seismic profile. The same underlying data produces fundamentally different narratives depending on who needs to act on it, and a language model is well-suited to that translation.

PSPS integration. SDG&E has conducted dozens of Public Safety Power Shutoff events since 2019. Joining CPUC PSPS de-energization polygons to census tracts would identify which Priority tracts have already lost grid power during fire weather events — the clearest possible argument for battery-backed solar grants.

Full market coverage. Adding the CEC NEM interconnection dataset would replace the five-installer partial picture with a complete view of rooftop solar penetration across all SD County tracts, sharpening the gap score and eliminating the lower-bound limitation of the current deployment measure.

Population-weighted heat interpolation. The 2010-to-2020 tract relationship file would allow proper population-weighted interpolation of Cal-Adapt heat values for the 201 imputed tracts, replacing county-mean substitution with tract-specific projections.

Built With

Share this project:

Updates