PyroSight

Inspiration

Wildfires are growing more destructive every year—the 2023 and 2024 fire seasons broke records across the western US and Canada, and the 2025 Los Angeles fires brought the crisis to one of the most densely populated regions in the country. First responders and evacuation planners often rely on outdated heuristics or coarse models that can't keep pace with rapidly shifting fire behavior. We wanted to build something that could give firefighters and emergency managers a next-day prediction of where a fire will spread, powered by the same satellite and weather data that's already being collected but underutilized.

The Next Day Wildfire Spread dataset from Huot et al. (IEEE TGRS 2022) gave us a strong foundation: 12 channels of real geospatial data covering topography, vegetation, weather, and active fire detections. But the original benchmark models treat prediction as a binary classification problem, outputting a single "will burn / won't burn" label per pixel with no indication of confidence. In high-stakes wildfire scenarios, knowing how certain the model is matters just as much as the prediction itself. That's what led us to evidential deep learning — a framework that learns to estimate its own uncertainty in a single forward pass, without expensive ensemble methods.

Challenges we ran into

Getting evidential deep learning to converge. Standard cross-entropy training is well-understood, but fitting a Dirichlet distribution over class probabilities introduces new hyperparameters (the KL-divergence annealing coefficient $\lambda$, evidence scaling) that are sensitive to tuning. Early runs either collapsed to uniform uncertainty everywhere or produced overconfident predictions that defeated the purpose. We had to carefully schedule $\lambda$ over epochs and experiment with loss weighting to get calibrated uncertainty maps.

Severe class imbalance. In any given $64 \times 64$ patch, the vast majority of pixels are "no fire." Naive training quickly learns to predict "no fire" everywhere and achieves high accuracy while being completely useless. We addressed this with a combination of focal loss weighting, strategic sampling of patches containing fire pixels, and evaluation metrics (precision, recall, $F_1$) that don't let the model hide behind accuracy.

Bridging the gap from static dataset to live inference. The training data comes from pre-processed TFRecords with neatly aligned channels. In production, we need to fetch live data from Google Earth Engine, NOAA, and VIIRS/MODIS feeds, reproject and resample everything onto matching grids, and tile it into $64 \times 64$ patches — all without introducing distribution shift. Matching the exact sources, resolutions, and normalization statistics from the training pipeline was a significant engineering challenge.

Frontend-to-model latency. We wanted an interactive map experience where users click a location and see predictions in seconds, not minutes. This meant optimizing the inference pipeline, pre-computing static rasters (elevation, land cover) that don't change daily, and caching aggressively.

What we learned

Uncertainty quantification changes the conversation. When we showed early demos with just binary predictions, the feedback was "so is my house safe or not?" Adding calibrated uncertainty maps shifted the conversation to "the model is confident fire will spread here, but uncertain about this corridor", which is far more actionable for evacuation planning and resource staging.

Evidential deep learning is powerful but underexplored. Compared to MC dropout or deep ensembles, evidential methods produce uncertainty estimates in a single forward pass with negligible overhead. The tradeoff is a trickier training process, but once converged, inference is fast and the uncertainty decomposition into aleatoric (data noise) vs. epistemic (model ignorance) components is directly interpretable:

$$u = \frac{K}{\sum_{k=1}^{K} \alpha_k}$$

where $\alpha_k$ are the Dirichlet concentration parameters and $K$ is the number of classes.

Data engineering is the real bottleneck. The ML model was arguably the easier part. Wrangling 12 heterogeneous geospatial data sources into a consistent, reproducible pipeline — with correct projections, temporal alignment, and gap-filling — took more effort than model development.

Geospatial ML has unique deployment challenges. Unlike standard image classification, every input pixel has a real-world coordinate. Off-by-one reprojection errors or misaligned timestamps don't just reduce accuracy — they produce predictions for the wrong place or the wrong day, which in a wildfire context could be dangerous.

What's next for PyroSight

Temporal modeling — Explore recurrent and attention-based architectures that ingest multi-day sequences to capture fire momentum and directional spread
Fire agency integration — Build alerting, GIS export (GeoTIFF/KML), and API endpoints compatible with CAL FIRE and NIFC incident command workflows
Model improvements — Higher-resolution inputs, additional channels (power line infrastructure, road networks as firebreaks), and multi-task learning predicting both spread probability and fire intensity.

Problem Statement

Wildfires are intensifying globally. The 2023–2024 North American fire seasons shattered historical records, and the January 2025 Los Angeles fires demonstrated that even densely populated urban-wildland interfaces are vulnerable. Incident commanders and evacuation planners need next-day spatial predictions of fire spread — but existing operational tools rely on coarse heuristics or expensive physics simulations that can't run in real time.

The core ML challenge is equally difficult: standard deep learning classifiers output a point estimate $\hat{y} \in {0,1}$ with no measure of reliability. In a domain where a false negative means people don't evacuate, a model that says "60% chance of fire" without telling you whether it trusts that number is not safe to deploy. We need calibrated uncertainty — a per-pixel signal that separates what the model doesn't know (epistemic uncertainty) from what the data itself makes inherently unpredictable (aleatoric uncertainty).

PyroSight addresses both problems simultaneously: it predicts next-day wildfire spread at $1\text{km}^2$ resolution across the contiguous United States and provides a per-pixel uncertainty map grounded in evidential deep learning theory.

Solution Overview

Evidential Deep Learning

Standard wildfire models output a single prediction — "this pixel will burn" — with no indication of whether the model actually has enough information to make that call. In a domain where a false negative means people don't evacuate, this is unacceptable. PyroSight solves this by placing a Dirichlet prior over the categorical distribution for each pixel, turning every prediction into a distribution over distributions.

For $K = 2$ classes (fire / no-fire), the model predicts concentration parameters $\boldsymbol{\alpha} = [\alpha_0, \alpha_1]$ with $\alpha_k \geq 1$, parameterizing:

$$\text{Dir}(\mathbf{p} \mid \boldsymbol{\alpha}) = \frac{\Gamma(S)}{\prod_{k=1}^{K}\Gamma(\alpha_k)} \prod_{k=1}^{K} p_k^{\alpha_k - 1}, \quad S = \sum_{k=1}^{K} \alpha_k$$

From this single forward pass we extract:

Expected probability: $\hat{p}_k = \alpha_k / S$
Epistemic uncertainty (vacuity): $u = K / S$ — high when total evidence $S$ is low
Evidence per class: $e_k = \alpha_k - 1$

Why this matters: Monte Carlo dropout requires dozens of stochastic forward passes. Deep ensembles require training and storing 5–10 separate models. Both are too slow or too expensive for real-time wildfire prediction. Evidential deep learning gives us the same uncertainty decomposition — separating what the data makes inherently unpredictable (aleatoric) from what the model simply doesn't know (epistemic) — in a single forward pass with zero additional inference cost. This is the difference between a 5-second response and a 60-second response during an active fire.

Dual-Branch U-Net Architecture

Wildfire spread is driven by two fundamentally different types of information: terrain and fuel (elevation, vegetation, population density) change at fine spatial scales: a ridge, a road, a river can stop a fire. Weather (wind, temperature, humidity) varies smoothly over kilometers. Treating all 12 channels the same forces the model to compromise. Our dual-branch design lets each branch use the right inductive bias for its data.

Branch	Channels	Kernel	Rationale
Fuel (4 ch)	Elevation, NDVI, Population, PrevFireMask	$3 \times 3$	High-frequency spatial features — captures ridgelines, fuel breaks, urban edges
Weather (8 ch)	Wind dir/speed, Temp min/max, Humidity, Precip, PDSI, ERC	$5 \times 5$ depthwise-separable	Smooth, low-resolution GRIDMET fields — avoids overfitting to 4km grid artifacts

The two branches are fused at each encoder scale via Cross-Attention Feature Interaction Modules (CAFIM) — learned spatial gates that let the fuel branch attend to weather context and vice versa:

$$g_{fuel} = \sigma(W_1 * F_{weather}), \quad \hat{F}_{fuel} = g_{fuel} \odot F_{fuel}$$

Why cross-attention instead of simple concatenation? Fire behavior is conditional: dry vegetation is only dangerous when wind is present, and high winds only matter where there's fuel to burn. CAFIM lets the model learn these interactions explicitly: the weather branch modulates which fuel features matter and vice versa, rather than hoping a shared convolution discovers the relationship on its own.

The encoder follows a $[64, 96, 128]$ width progression across 3 levels with $2 \times$ max-pooling ($64^2 \to 32^2 \to 16^2 \to 8^2$), residual skip connections, GroupNorm, GELU activations, Squeeze-and-Excitation channel attention, and stochastic depth. A 256-channel bottleneck with 4-head spatial self-attention sits at the coarsest scale. The decoder mirrors the encoder with transposed convolutions and skip connections.

The EDL head is a $1 \times 1$ convolution projecting to 2 channels, passed through $\text{softplus}(z) + 1$ to guarantee valid Dirichlet parameters. Deep supervision adds auxiliary EDL heads at $16^2$ and $32^2$ scales, which act as gradient highways that prevent the vanishing gradient problem in the deeper encoder layers and improve convergence speed.

Physics-Informed Training

Pure data-driven models can learn physically impossible behavior — predicting fire spread upwind, through rivers, or into bare rock. We constrain the model with physics-informed loss terms derived from the Rothermel fire spread model, the same equations used by the US Forest Service. This means the model can never achieve a low loss by violating the basic physics of fire.

The loss combines three EDL terms with three physics-informed regularizers:

$$L = L_{MSE} + \lambda_{KL}(t) \cdot L_{KL} + \lambda_{dice} \cdot L_{Dice} + \lambda_w \cdot L_{wind} + \lambda_s \cdot L_{slope} + \lambda_f \cdot L_{fuel}$$

where:

Bayes risk MSE penalizes both misclassification and model variance. This is what makes the model want to be right while also being honest about what it doesn't know: $$L_{MSE} = \sum_k \left[(y_k - \hat{p}_k)^2 + \frac{\alpha_k(S - \alpha_k)}{S^2(S+1)}\right]$$
KL divergence regularizes the Dirichlet toward uniformity for incorrect classes — without this, the model accumulates spurious evidence and becomes dangerously overconfident: $$L_{KL} = \ln \frac{\Gamma(S)}{\Gamma(K)} - \sum_k \ln \Gamma(\alpha_k) + \sum_k (\alpha_k - 1)[\psi(\alpha_k) - \psi(S)]$$
Wind consistency enforces downwind fire propagation using Rothermel spread physics, as fire spreads in the direction the wind blows, not against it: $$L_{wind} = \max(0, -\nabla p \cdot \vec{w}) \cdot 1_{near\ fire}$$

The KL coefficient follows a quadratic annealing schedule $\lambda_{KL}(t) = \lambda_{max} \cdot \min(1, t/T_{anneal})^2$. Why quadratic, not linear? Early in training, the model hasn't learned anything yet. If you regularize immediately, you crush the evidence signal before it forms. The quadratic ramp gives the model time to accumulate meaningful Dirichlet mass in the first epochs, then gradually tightens regularization to prevent overconfidence. In our experiments, linear annealing caused evidence collapse in 30% of runs; quadratic annealing eliminated this failure mode entirely.

Design Decisions

AdamW optimizer ($\beta = (0.9, 0.999)$, weight decay $5 \times 10^{-4}$) with cosine-annealed LR after linear warmup — decoupled weight decay prevents the regularization strength from coupling to the learning rate, which is critical when the LR changes by 100x over training
Focal-weighted class balancing ($w_{fire} = 5.0$) to counter extreme class imbalance (~3% fire pixels) — without this, the model learns to predict "no fire" everywhere and achieves 97% accuracy while being completely useless
Wind-aware augmentation: spatial flips and rotations transform the wind azimuth channel to preserve physical consistency (e.g., horizontal flip maps $\theta \to (360 - \theta) \bmod 360$) — naive augmentation would teach the model that fire spreads in random directions relative to wind, destroying the physics signal
Pure-Python TFRecord parser: eliminates the TensorFlow dependency entirely (~100 LOC); this cuts the Docker image size by ~2 GB and removes the most common source of installation failures
Learned weather upsampler (v2): a small 32-channel network smooths GRIDMET 4km grid artifacts instead of relying on bilinear interpolation alone — the raw GRIDMET data has visible checkerboard patterns at 4km resolution that the model would otherwise memorize as spurious features

Documentation

System Architecture

PyroSight is a three-tier system:

Data Layer — Pulls 12 geospatial channels from public sources (GRIDMET weather, VIIRS vegetation, SRTM elevation, NASA FIRMS active fires) and assembles them into a normalized $64 \times 64$ tile for any location in the contiguous US.
Model Layer — A dual-branch U-Net with an evidential deep learning head runs inference on the tile in a single forward pass, producing per-pixel fire spread probabilities and calibrated uncertainty maps.
Serving Layer — A FastAPI backend exposes prediction endpoints; a Next.js frontend renders results on an interactive MapLibre GL map with risk overlays, geocoding, etc.

Data flows left to right: satellites / weather stations → tile builder → EDL model → API → interactive map.

Input Channels

#	Channel	Source	Resolution	Units
0	Elevation	SRTM	30m → 1km	meters
1	Wind direction	GRIDMET	4km	degrees
2	Wind speed	GRIDMET	4km	m/s
3	Min temperature	GRIDMET	4km	Kelvin
4	Max temperature	GRIDMET	4km	Kelvin
5	Specific humidity	GRIDMET	4km	kg/kg
6	Precipitation	GRIDMET	4km	mm/day
7	Drought index (PDSI)	GRIDMET	4km	dimensionless
8	NDVI	VIIRS	500m, 8-day	scaled ×10⁴
9	Population density	GPWv4	1km	people/km²
10	Energy Release Component	GRIDMET/NFDRS	4km	dimensionless
11	Previous fire mask	MODIS/FIRMS	1km	binary

All channels are z-score normalized with fixed training statistics and clipped to $\pm 10\sigma$. The previous fire mask is binarized: $\hat{x} = \mathbb{1}[x > 0]$.

API Endpoints

Route	Method	Description
`/api/samples/{id}/assess`	GET	Run inference on a pre-loaded test sample
`/api/assess/live`	POST	Real-time prediction for any lat/lng + date
`/api/fires/active`	GET	NASA FIRMS active fire detections (last 24h)
`/api/model/info`	GET	Architecture metadata and parameter count
`/api/geocode?q=...`	GET	Place name → coordinates (Nominatim)
`/api/compare`	POST	Side-by-side assessment of multiple samples
`/api/batch/assess`	POST	Batch inference with aggregated statistics

Evaluation Metrics

Metric	Formula	Purpose
$F_1$	$2 \cdot \frac{P \cdot R}{P + R}$	Primary; threshold-searched over $[0.02, 0.5]$
AUC-PR	Area under Precision-Recall curve	Threshold-free ranking quality
ECE	$\sum_b \frac{n_b}{N} \lvert \text{acc}_b - \text{conf}_b \rvert$	Calibration (15 bins)
Brier	$\frac{1}{N}\sum(p_i - y_i)^2$	Probabilistic accuracy
AUSE	Area Under Sparsification Error	Uncertainty-error alignment
AURC	Area Under Risk-Coverage	Selective prediction quality

Deployment

Backend: Docker container (Python 3.12, GDAL, PyTorch) deployed on Railway
Frontend: Next.js on Vercel with MapLibre GL for map rendering
Static rasters: Pre-cached DEM, population, and NDVI tiles (~2 GB) bundled in the Docker image
Live data: GRIDMET weather fetched daily with L1–L5 cache hierarchy

Practical Relevance

Emergency Management

Fire agencies (CAL FIRE, NIFC, county emergency managers) can use PyroSight's per-pixel uncertainty maps to prioritize evacuation zones, focusing resources where the model is both confident and predicting high spread probability, while flagging uncertain corridors for additional monitoring.

Resource Staging

The risk classification system (CRITICAL / HIGH / MODERATE / LOW) with confidence scores enables pre-positioning of firefighting resources. A CRITICAL prediction with 90% confidence justifies deploying crews; the same prediction at 40% confidence suggests deploying scouts instead.

Insurance and Urban Planning

PyroSight's spatially explicit risk maps can inform wildfire risk scoring for properties in the wildland-urban interface, complementing existing FEMA flood maps with fire-specific hazard layers.

Research Platform

The modular architecture — separate data pipeline, model, and serving layers — makes PyroSight a testbed for wildfire ML research. Researchers can swap in new architectures, add channels, or modify the physics-informed loss without rewriting the data or serving infrastructure.

Coverage and Latency

PyroSight can generate a prediction for any location in the contiguous United States, fast enough for interactive use during active incidents.

Team Information

Made solo!

Built With

css
docker
fastapi
google-earth-engine
gpwv4
gridmet
html
javascript
maplibre-gl
matplotlib
nasa-firms
nasa-viirs
netcdf4
next.js
nominatim
numpy
openstreetmap
python
pytorch
railway
rasterio
react
srtm
tailwind-css
typescript
uvicorn
vercel
xarray