Inspiration

Wildfires are growing more destructive every year—the 2023 and 2024 fire seasons broke records across the western US and Canada, and the 2025 Los Angeles fires brought the crisis to one of the most densely populated regions in the country. First responders and evacuation planners often rely on outdated heuristics or coarse models that can't keep pace with rapidly shifting fire behavior. We wanted to build something that could give firefighters and emergency managers a next-day prediction of where a fire will spread, powered by the same satellite and weather data that's already being collected but underutilized.

The Next Day Wildfire Spread dataset from Huot et al. (IEEE TGRS 2022) gave us a strong foundation: 12 channels of real geospatial data covering topography, vegetation, weather, and active fire detections. But the original benchmark models treat prediction as a binary classification problem, outputting a single "will burn / won't burn" label per pixel with no indication of confidence. In high-stakes wildfire scenarios, knowing how certain the model is matters just as much as the prediction itself. That's what led us to evidential deep learning — a framework that learns to estimate its own uncertainty in a single forward pass, without expensive ensemble methods.

Challenges we ran into

Getting evidential deep learning to converge. Standard cross-entropy training is well-understood, but fitting a Dirichlet distribution over class probabilities introduces new hyperparameters (the KL-divergence annealing coefficient \(\lambda\), evidence scaling) that are sensitive to tuning. Early runs either collapsed to uniform uncertainty everywhere or produced overconfident predictions that defeated the purpose. We had to carefully schedule \(\lambda\) over epochs and experiment with loss weighting to get calibrated uncertainty maps.

Severe class imbalance. In any given \(64 \times 64\) patch, the vast majority of pixels are "no fire." Naive training quickly learns to predict "no fire" everywhere and achieves high accuracy while being completely useless. We addressed this with a combination of focal loss weighting, strategic sampling of patches containing fire pixels, and evaluation metrics (precision, recall, \(F_1\)) that don't let the model hide behind accuracy.

Bridging the gap from static dataset to live inference. The training data comes from pre-processed TFRecords with neatly aligned channels. In production, we need to fetch live data from Google Earth Engine, NOAA, and VIIRS/MODIS feeds, reproject and resample everything onto matching grids, and tile it into \(64 \times 64\) patches — all without introducing distribution shift. Matching the exact sources, resolutions, and normalization statistics from the training pipeline was a significant engineering challenge.

Frontend-to-model latency. We wanted an interactive map experience where users click a location and see predictions in seconds, not minutes. This meant optimizing the inference pipeline, pre-computing static rasters (elevation, land cover) that don't change daily, and caching aggressively.

What we learned

Uncertainty quantification changes the conversation. When we showed early demos with just binary predictions, the feedback was "so is my house safe or not?" Adding calibrated uncertainty maps shifted the conversation to "the model is confident fire will spread here, but uncertain about this corridor", which is far more actionable for evacuation planning and resource staging.

Evidential deep learning is powerful but underexplored. Compared to MC dropout or deep ensembles, evidential methods produce uncertainty estimates in a single forward pass with negligible overhead. The tradeoff is a trickier training process, but once converged, inference is fast and the uncertainty decomposition into aleatoric (data noise) vs. epistemic (model ignorance) components is directly interpretable:

$$u = \frac{K}{\sum_{k=1}^{K} \alpha_k}$$

where \(\alpha_k\) are the Dirichlet concentration parameters and \(K\) is the number of classes.

Data engineering is the real bottleneck. The ML model was arguably the easier part. Wrangling 12 heterogeneous geospatial data sources into a consistent, reproducible pipeline — with correct projections, temporal alignment, and gap-filling — took more effort than model development.

Geospatial ML has unique deployment challenges. Unlike standard image classification, every input pixel has a real-world coordinate. Off-by-one reprojection errors or misaligned timestamps don't just reduce accuracy — they produce predictions for the wrong place or the wrong day, which in a wildfire context could be dangerous.

What's next for PyroSight

  • Temporal modeling — Explore recurrent and attention-based architectures that ingest multi-day sequences to capture fire momentum and directional spread
  • Fire agency integration — Build alerting, GIS export (GeoTIFF/KML), and API endpoints compatible with CAL FIRE and NIFC incident command workflows
  • Model improvements — Higher-resolution inputs, additional channels (power line infrastructure, road networks as firebreaks), and multi-task learning predicting both spread probability and fire intensity.

Problem Statement

Wildfires are intensifying globally. The 2023–2024 North American fire seasons shattered historical records, and the January 2025 Los Angeles fires demonstrated that even densely populated urban-wildland interfaces are vulnerable. Incident commanders and evacuation planners need next-day spatial predictions of fire spread — but existing operational tools rely on coarse heuristics or expensive physics simulations that can't run in real time.

The core ML challenge is equally difficult: standard deep learning classifiers output a point estimate \(\hat{y} \in {0,1}\) with no measure of reliability. In a domain where a false negative means people don't evacuate, a model that says "60% chance of fire" without telling you whether it trusts that number is not safe to deploy. We need calibrated uncertainty — a per-pixel signal that separates what the model doesn't know (epistemic uncertainty) from what the data itself makes inherently unpredictable (aleatoric uncertainty).

PyroSight addresses both problems simultaneously: it predicts next-day wildfire spread at \(1\text{km}^2\) resolution across the contiguous United States and provides a per-pixel uncertainty map grounded in evidential deep learning theory.


Solution Overview

Evidential Deep Learning

Standard wildfire models output a single prediction — "this pixel will burn" — with no indication of whether the model actually has enough information to make that call. In a domain where a false negative means people don't evacuate, this is unacceptable. PyroSight solves this by placing a Dirichlet prior over the categorical distribution for each pixel, turning every prediction into a distribution over distributions.

For \(K = 2\) classes (fire / no-fire), the model predicts concentration parameters \(\boldsymbol{\alpha} = [\alpha_0, \alpha_1]\) with \(\alpha_k \geq 1\), parameterizing:

$$\text{Dir}(\mathbf{p} \mid \boldsymbol{\alpha}) = \frac{\Gamma(S)}{\prod_{k=1}^{K}\Gamma(\alpha_k)} \prod_{k=1}^{K} p_k^{\alpha_k - 1}, \quad S = \sum_{k=1}^{K} \alpha_k$$

From this single forward pass we extract:

  • Expected probability: \(\hat{p}_k = \alpha_k / S\)
  • Epistemic uncertainty (vacuity): \(u = K / S\) — high when total evidence \(S\) is low
  • Evidence per class: \(e_k = \alpha_k - 1\)

Why this matters: Monte Carlo dropout requires dozens of stochastic forward passes. Deep ensembles require training and storing 5–10 separate models. Both are too slow or too expensive for real-time wildfire prediction. Evidential deep learning gives us the same uncertainty decomposition — separating what the data makes inherently unpredictable (aleatoric) from what the model simply doesn't know (epistemic) — in a single forward pass with zero additional inference cost. This is the difference between a 5-second response and a 60-second response during an active fire.

Dual-Branch U-Net Architecture

Wildfire spread is driven by two fundamentally different types of information: terrain and fuel (elevation, vegetation, population density) change at fine spatial scales: a ridge, a road, a river can stop a fire. Weather (wind, temperature, humidity) varies smoothly over kilometers. Treating all 12 channels the same forces the model to compromise. Our dual-branch design lets each branch use the right inductive bias for its data.

Branch Channels Kernel Rationale
Fuel (4 ch) Elevation, NDVI, Population, PrevFireMask \(3 \times 3\) High-frequency spatial features — captures ridgelines, fuel breaks, urban edges
Weather (8 ch) Wind dir/speed, Temp min/max, Humidity, Precip, PDSI, ERC \(5 \times 5\) depthwise-separable Smooth, low-resolution GRIDMET fields — avoids overfitting to 4km grid artifacts

The two branches are fused at each encoder scale via Cross-Attention Feature Interaction Modules (CAFIM) — learned spatial gates that let the fuel branch attend to weather context and vice versa:

$$g_{fuel} = \sigma(W_1 * F_{weather}), \quad \hat{F}_{fuel} = g_{fuel} \odot F_{fuel}$$

Why cross-attention instead of simple concatenation? Fire behavior is conditional: dry vegetation is only dangerous when wind is present, and high winds only matter where there's fuel to burn. CAFIM lets the model learn these interactions explicitly: the weather branch modulates which fuel features matter and vice versa, rather than hoping a shared convolution discovers the relationship on its own.

The encoder follows a \([64, 96, 128]\) width progression across 3 levels with \(2 \times\) max-pooling (\(64^2 \to 32^2 \to 16^2 \to 8^2\)), residual skip connections, GroupNorm, GELU activations, Squeeze-and-Excitation channel attention, and stochastic depth. A 256-channel bottleneck with 4-head spatial self-attention sits at the coarsest scale. The decoder mirrors the encoder with transposed convolutions and skip connections.

The EDL head is a \(1 \times 1\) convolution projecting to 2 channels, passed through \(\text{softplus}(z) + 1\) to guarantee valid Dirichlet parameters. Deep supervision adds auxiliary EDL heads at \(16^2\) and \(32^2\) scales, which act as gradient highways that prevent the vanishing gradient problem in the deeper encoder layers and improve convergence speed.

Physics-Informed Training

Pure data-driven models can learn physically impossible behavior — predicting fire spread upwind, through rivers, or into bare rock. We constrain the model with physics-informed loss terms derived from the Rothermel fire spread model, the same equations used by the US Forest Service. This means the model can never achieve a low loss by violating the basic physics of fire.

The loss combines three EDL terms with three physics-informed regularizers:

$$L = L_{MSE} + \lambda_{KL}(t) \cdot L_{KL} + \lambda_{dice} \cdot L_{Dice} + \lambda_w \cdot L_{wind} + \lambda_s \cdot L_{slope} + \lambda_f \cdot L_{fuel}$$

where:

  • Bayes risk MSE penalizes both misclassification and model variance. This is what makes the model want to be right while also being honest about what it doesn't know: $$L_{MSE} = \sum_k \left[(y_k - \hat{p}_k)^2 + \frac{\alpha_k(S - \alpha_k)}{S^2(S+1)}\right]$$

  • KL divergence regularizes the Dirichlet toward uniformity for incorrect classes — without this, the model accumulates spurious evidence and becomes dangerously overconfident: $$L_{KL} = \ln \frac{\Gamma(S)}{\Gamma(K)} - \sum_k \ln \Gamma(\alpha_k) + \sum_k (\alpha_k - 1)[\psi(\alpha_k) - \psi(S)]$$

  • Wind consistency enforces downwind fire propagation using Rothermel spread physics, as fire spreads in the direction the wind blows, not against it: $$L_{wind} = \max(0, -\nabla p \cdot \vec{w}) \cdot 1_{near\ fire}$$

The KL coefficient follows a quadratic annealing schedule \(\lambda_{KL}(t) = \lambda_{max} \cdot \min(1, t/T_{anneal})^2\). Why quadratic, not linear? Early in training, the model hasn't learned anything yet. If you regularize immediately, you crush the evidence signal before it forms. The quadratic ramp gives the model time to accumulate meaningful Dirichlet mass in the first epochs, then gradually tightens regularization to prevent overconfidence. In our experiments, linear annealing caused evidence collapse in 30% of runs; quadratic annealing eliminated this failure mode entirely.

Design Decisions

  • AdamW optimizer (\(\beta = (0.9, 0.999)\), weight decay \(5 \times 10^{-4}\)) with cosine-annealed LR after linear warmup — decoupled weight decay prevents the regularization strength from coupling to the learning rate, which is critical when the LR changes by 100x over training
  • Focal-weighted class balancing (\(w_{fire} = 5.0\)) to counter extreme class imbalance (~3% fire pixels) — without this, the model learns to predict "no fire" everywhere and achieves 97% accuracy while being completely useless
  • Wind-aware augmentation: spatial flips and rotations transform the wind azimuth channel to preserve physical consistency (e.g., horizontal flip maps \(\theta \to (360 - \theta) \bmod 360\)) — naive augmentation would teach the model that fire spreads in random directions relative to wind, destroying the physics signal
  • Pure-Python TFRecord parser: eliminates the TensorFlow dependency entirely (~100 LOC); this cuts the Docker image size by ~2 GB and removes the most common source of installation failures
  • Learned weather upsampler (v2): a small 32-channel network smooths GRIDMET 4km grid artifacts instead of relying on bilinear interpolation alone — the raw GRIDMET data has visible checkerboard patterns at 4km resolution that the model would otherwise memorize as spurious features

Documentation

System Architecture

PyroSight is a three-tier system:

  1. Data Layer — Pulls 12 geospatial channels from public sources (GRIDMET weather, VIIRS vegetation, SRTM elevation, NASA FIRMS active fires) and assembles them into a normalized \(64 \times 64\) tile for any location in the contiguous US.

  2. Model Layer — A dual-branch U-Net with an evidential deep learning head runs inference on the tile in a single forward pass, producing per-pixel fire spread probabilities and calibrated uncertainty maps.

  3. Serving Layer — A FastAPI backend exposes prediction endpoints; a Next.js frontend renders results on an interactive MapLibre GL map with risk overlays, geocoding, etc.

Data flows left to right: satellites / weather stations → tile builder → EDL model → API → interactive map.

Input Channels

# Channel Source Resolution Units
0 Elevation SRTM 30m → 1km meters
1 Wind direction GRIDMET 4km degrees
2 Wind speed GRIDMET 4km m/s
3 Min temperature GRIDMET 4km Kelvin
4 Max temperature GRIDMET 4km Kelvin
5 Specific humidity GRIDMET 4km kg/kg
6 Precipitation GRIDMET 4km mm/day
7 Drought index (PDSI) GRIDMET 4km dimensionless
8 NDVI VIIRS 500m, 8-day scaled ×10⁴
9 Population density GPWv4 1km people/km²
10 Energy Release Component GRIDMET/NFDRS 4km dimensionless
11 Previous fire mask MODIS/FIRMS 1km binary

All channels are z-score normalized with fixed training statistics and clipped to \(\pm 10\sigma\). The previous fire mask is binarized: \(\hat{x} = \mathbb{1}[x > 0]\).

API Endpoints

Route Method Description
/api/samples/{id}/assess GET Run inference on a pre-loaded test sample
/api/assess/live POST Real-time prediction for any lat/lng + date
/api/fires/active GET NASA FIRMS active fire detections (last 24h)
/api/model/info GET Architecture metadata and parameter count
/api/geocode?q=... GET Place name → coordinates (Nominatim)
/api/compare POST Side-by-side assessment of multiple samples
/api/batch/assess POST Batch inference with aggregated statistics

Evaluation Metrics

Metric Formula Purpose
\(F_1\) \(2 \cdot \frac{P \cdot R}{P + R}\) Primary; threshold-searched over \([0.02, 0.5]\)
AUC-PR Area under Precision-Recall curve Threshold-free ranking quality
ECE \(\sum_b \frac{n_b}{N} \lvert \text{acc}_b - \text{conf}_b \rvert\) Calibration (15 bins)
Brier \(\frac{1}{N}\sum(p_i - y_i)^2\) Probabilistic accuracy
AUSE Area Under Sparsification Error Uncertainty-error alignment
AURC Area Under Risk-Coverage Selective prediction quality

Deployment

  • Backend: Docker container (Python 3.12, GDAL, PyTorch) deployed on Railway
  • Frontend: Next.js on Vercel with MapLibre GL for map rendering
  • Static rasters: Pre-cached DEM, population, and NDVI tiles (~2 GB) bundled in the Docker image
  • Live data: GRIDMET weather fetched daily with L1–L5 cache hierarchy

Practical Relevance

Emergency Management

Fire agencies (CAL FIRE, NIFC, county emergency managers) can use PyroSight's per-pixel uncertainty maps to prioritize evacuation zones, focusing resources where the model is both confident and predicting high spread probability, while flagging uncertain corridors for additional monitoring.

Resource Staging

The risk classification system (CRITICAL / HIGH / MODERATE / LOW) with confidence scores enables pre-positioning of firefighting resources. A CRITICAL prediction with 90% confidence justifies deploying crews; the same prediction at 40% confidence suggests deploying scouts instead.

Insurance and Urban Planning

PyroSight's spatially explicit risk maps can inform wildfire risk scoring for properties in the wildland-urban interface, complementing existing FEMA flood maps with fire-specific hazard layers.

Research Platform

The modular architecture — separate data pipeline, model, and serving layers — makes PyroSight a testbed for wildfire ML research. Researchers can swap in new architectures, add channels, or modify the physics-informed loss without rewriting the data or serving infrastructure.

Coverage and Latency

PyroSight can generate a prediction for any location in the contiguous United States, fast enough for interactive use during active incidents.


Team Information

Made solo!

Built With

Share this project:

Updates