Inspiration
San Diego County averages over 266 sunny days per year, yet solar adoption is deeply uneven across its 736 census tracts. The neighborhoods with the greatest need — highest heat exposure, lowest income, worst air quality — are often the same ones with the least solar deployment. This is not just an equity problem. It is an untapped market opportunity hiding in plain sight.
The project started with a simple question: where is solar most needed and least present? The answer required combining four independent datasets across different geographies, time horizons, and measurement frameworks into a single analytically rigorous model.
What it does
The dashboard quantifies the solar deployment gap across all 736 SD County census tracts by computing three scores for each tract:
Need Score — a weighted percentile rank across three vulnerability dimensions:
$$\text{score_need} = w_{\text{heat}} \cdot \text{pct}(\bar{x}{\text{heat}}) + w{\text{svi}} \cdot \text{pct}(\text{SVI}) + w_{\text{ces}} \cdot \text{pct}(\text{CES})$$
Deployment Score — a weighted percentile rank of observed solar activity:
$$\text{score_deployment} = 0.70 \cdot \text{pct}(\text{permits}) + 0.30 \cdot \text{pct}(\text{installer count})$$
Gap Score — the core finding:
$$\text{score_gap} = \text{score_need} - \text{score_deployment}$$
Tracts with $\text{score_gap} \geq 0.40$ are classified Priority. Under the primary Balanced weighting scheme (Heat 40% / SVI 35% / CES 25%), 69 tracts are Priority under all five weight schemes tested — a robust, scheme-independent recommendation cohort representing over 180,000 residents with no meaningful solar footprint from any of the five major installers in the dataset.
The dashboard surfaces this through eight pages: a priority gap map, need landscape, deployment analysis, tract-level drilldown, sensitivity analysis across five weight schemes, hazard layers (seismic PGA, earthquake history, PSPS fire shutoff context), and full methodology documentation.
How we built it
Data pipeline. Five installer permit datasets contained only street addresses — no census tract identifiers. These were geocoded using the Census Bureau Batch Geocoder, pre-filtered to San Diego ZIP prefixes, and cached locally. Match rates exceeded 93% across all installers, yielding 12,615 permits across 518 tracts.
Four independent sources were joined to a 2020 tract:
| Dataset | Source | Coverage |
|---|---|---|
| Installer permits | Sunrun, Sullivan, Titan, Freedom Forever, SolarCity | 518 SD tracts |
| Extreme heat days | Cal-Adapt (2006–2100) | 627 tracts (2010 boundaries) |
| Social Vulnerability Index | CDC/ATSDR SVI 2022 | 736 tracts |
| CalEnviroScreen 4.0 | OEHHA | 736 tracts |
Scoring. All inputs are converted to within-county percentile ranks before
weighting, ensuring no single metric dominates due to scale differences. Five
weight schemes spanning the defensible range are computed simultaneously. The
priority_agreement column (0–5) counts how many schemes classify each tract
as Priority — 69 tracts score 5/5.
Hazard layers. USGS ASCE 7-22 design PGA values were fetched for all 736 tract centroids via the USGS design maps API, giving a seismic hazard score for every tract. 500 historical M3+ earthquake events were pulled from the USGS FDSN catalog and overlaid as a scatter layer on the seismic map.
Application. An 8-page Streamlit app with Plotly choropleth maps on
carto-darkmatter tiles, a dark professional theme, and a global weight
scheme selector that updates every map and metric simultaneously.
Challenges we ran into
Geocoding at scale. The raw installer files had no tract identifiers — only addresses. Sending 12,000+ addresses through the Census Batch Geocoder in chunks, handling non-exact matches, and building a local cache to avoid redundant API calls required careful pipeline design.
Boundary mismatch. Cal-Adapt heat data uses 2010 tract boundaries; everything else uses 2020. The Census redrew 201 tracts between decennial surveys. A production implementation would use the Census 2010-to-2020 relationship file for population-weighted interpolation — county-mean imputation with an explicit flag was used as a defensible approximation.
Double-counting vulnerability. CalEnviroScreen's Population Characteristics component substantially overlaps SVI Theme 1. Using both at full weight inflates the influence of demographic vulnerability. The Balanced scheme down-weights CES to 0.25 specifically to mitigate this.
Incomplete market coverage. The five installer datasets capture only a portion of the SD solar market. Baker Electric, Sunpower, and numerous local operators are not included. Permit counts are a lower bound on true deployment.
Accomplishments that we're proud of
The sensitivity analysis is the accomplishment we are most proud of. Rather than committing to a single weighting assumption, five schemes spanning the defensible range were tested simultaneously. The robustness cohort is the finding, not the map.
What we learned
Building this project made clear how much analytical credibility depends on being explicit about uncertainty. Showing that a finding holds across all reasonable assumptions is more persuasive than any single optimized result.
We also learned how fragile geospatial joins can be in practice. Boundary mismatches, coordinate system inconsistencies, and API instability across public government data sources required defensive engineering at every step. The geocoding cache, imputation flags, and fallback handlers were all responses to real failures encountered during the build.
What's next for SD Solar Equity Dashboard
LLM-powered tract recommendations. The most impactful near-term addition is a natural language recommendation engine built on top of the tract-level data. A user selects a census tract and specifies a target audience — a solar installer, a grant program administrator, a city council office, or a community organization — and a language model generates a tailored one-page brief explaining why that tract merits attention, what the dominant risk factors are, what program type fits the housing composition, and what the strongest grant framing is given the local SVI, heat, and seismic profile. The same underlying data produces fundamentally different narratives depending on who needs to act on it, and a language model is well-suited to that translation.
PSPS integration. SDG&E has conducted dozens of Public Safety Power Shutoff events since 2019. Joining CPUC PSPS de-energization polygons to census tracts would identify which Priority tracts have already lost grid power during fire weather events — the clearest possible argument for battery-backed solar grants.
Full market coverage. Adding the CEC NEM interconnection dataset would replace the five-installer partial picture with a complete view of rooftop solar penetration across all SD County tracts, sharpening the gap score and eliminating the lower-bound limitation of the current deployment measure.
Population-weighted heat interpolation. The 2010-to-2020 tract relationship file would allow proper population-weighted interpolation of Cal-Adapt heat values for the 201 imputed tracts, replacing county-mean substitution with tract-specific projections.
Log in or sign up for Devpost to join the conversation.