Inspiration
Wildfires are growing more destructive every year—the 2023 and 2024 fire seasons broke records across the western US and Canada, and the 2025 Los Angeles fires brought the crisis to one of the most densely populated regions in the country. First responders and evacuation planners often rely on outdated heuristics or coarse models that can't keep pace with rapidly shifting fire behavior. We wanted to build something that could give firefighters and emergency managers a next-day prediction of where a fire will spread, powered by the same satellite and weather data that's already being collected but underutilized.
The Next Day Wildfire Spread dataset from Huot et al. (IEEE TGRS 2022) gave us a strong foundation: 12 channels of real geospatial data covering topography, vegetation, weather, and active fire detections. But the original benchmark models treat prediction as a binary classification problem, outputting a single "will burn / won't burn" label per pixel with no indication of confidence. In high-stakes wildfire scenarios, knowing how certain the model is matters just as much as the prediction itself. That's what led us to evidential deep learning — a framework that learns to estimate its own uncertainty in a single forward pass, without expensive ensemble methods.
Challenges we ran into
Getting evidential deep learning to converge. Standard cross-entropy training is well-understood, but fitting a Dirichlet distribution over class probabilities introduces new hyperparameters (the KL-divergence annealing coefficient \(\lambda\), evidence scaling) that are sensitive to tuning. Early runs either collapsed to uniform uncertainty everywhere or produced overconfident predictions that defeated the purpose. We had to carefully schedule \(\lambda\) over epochs and experiment with loss weighting to get calibrated uncertainty maps.
Severe class imbalance. In any given \(64 \times 64\) patch, the vast majority of pixels are "no fire." Naive training quickly learns to predict "no fire" everywhere and achieves high accuracy while being completely useless. We addressed this with a combination of focal loss weighting, strategic sampling of patches containing fire pixels, and evaluation metrics (precision, recall, \(F_1\)) that don't let the model hide behind accuracy.
Bridging the gap from static dataset to live inference. The training data comes from pre-processed TFRecords with neatly aligned channels. In production, we need to fetch live data from Google Earth Engine, NOAA, and VIIRS/MODIS feeds, reproject and resample everything onto matching grids, and tile it into \(64 \times 64\) patches — all without introducing distribution shift. Matching the exact sources, resolutions, and normalization statistics from the training pipeline was a significant engineering challenge.
Frontend-to-model latency. We wanted an interactive map experience where users click a location and see predictions in seconds, not minutes. This meant optimizing the inference pipeline, pre-computing static rasters (elevation, land cover) that don't change daily, and caching aggressively.
What we learned
Uncertainty quantification changes the conversation. When we showed early demos with just binary predictions, the feedback was "so is my house safe or not?" Adding calibrated uncertainty maps shifted the conversation to "the model is confident fire will spread here, but uncertain about this corridor", which is far more actionable for evacuation planning and resource staging.
Evidential deep learning is powerful but underexplored. Compared to MC dropout or deep ensembles, evidential methods produce uncertainty estimates in a single forward pass with negligible overhead. The tradeoff is a trickier training process, but once converged, inference is fast and the uncertainty decomposition into aleatoric (data noise) vs. epistemic (model ignorance) components is directly interpretable:
$$u = \frac{K}{\sum_{k=1}^{K} \alpha_k}$$
where \(\alpha_k\) are the Dirichlet concentration parameters and \(K\) is the number of classes.
Data engineering is the real bottleneck. The ML model was arguably the easier part. Wrangling 12 heterogeneous geospatial data sources into a consistent, reproducible pipeline — with correct projections, temporal alignment, and gap-filling — took more effort than model development.
Geospatial ML has unique deployment challenges. Unlike standard image classification, every input pixel has a real-world coordinate. Off-by-one reprojection errors or misaligned timestamps don't just reduce accuracy — they produce predictions for the wrong place or the wrong day, which in a wildfire context could be dangerous.
What's next for PyroSight
- Temporal modeling — Explore recurrent and attention-based architectures that ingest multi-day sequences to capture fire momentum and directional spread
- Fire agency integration — Build alerting, GIS export (GeoTIFF/KML), and API endpoints compatible with CAL FIRE and NIFC incident command workflows
- Model improvements — Higher-resolution inputs, additional channels (power line infrastructure, road networks as firebreaks), and multi-task learning predicting both spread probability and fire intensity.
Problem Statement
Wildfires are intensifying globally. The 2023–2024 North American fire seasons shattered historical records, and the January 2025 Los Angeles fires demonstrated that even densely populated urban-wildland interfaces are vulnerable. Incident commanders and evacuation planners need next-day spatial predictions of fire spread — but existing operational tools rely on coarse heuristics or expensive physics simulations that can't run in real time.
The core ML challenge is equally difficult: standard deep learning classifiers output a point estimate \(\hat{y} \in {0,1}\) with no measure of reliability. In a domain where a false negative means people don't evacuate, a model that says "60% chance of fire" without telling you whether it trusts that number is not safe to deploy. We need calibrated uncertainty — a per-pixel signal that separates what the model doesn't know (epistemic uncertainty) from what the data itself makes inherently unpredictable (aleatoric uncertainty).
PyroSight addresses both problems simultaneously: it predicts next-day wildfire spread at \(1\text{km}^2\) resolution across the contiguous United States and provides a per-pixel uncertainty map grounded in evidential deep learning theory.
Solution Overview
Evidential Deep Learning
Standard wildfire models output a single prediction — "this pixel will burn" — with no indication of whether the model actually has enough information to make that call. In a domain where a false negative means people don't evacuate, this is unacceptable. PyroSight solves this by placing a Dirichlet prior over the categorical distribution for each pixel, turning every prediction into a distribution over distributions.
For \(K = 2\) classes (fire / no-fire), the model predicts concentration parameters \(\boldsymbol{\alpha} = [\alpha_0, \alpha_1]\) with \(\alpha_k \geq 1\), parameterizing:
$$\text{Dir}(\mathbf{p} \mid \boldsymbol{\alpha}) = \frac{\Gamma(S)}{\prod_{k=1}^{K}\Gamma(\alpha_k)} \prod_{k=1}^{K} p_k^{\alpha_k - 1}, \quad S = \sum_{k=1}^{K} \alpha_k$$
From this single forward pass we extract:
- Expected probability: \(\hat{p}_k = \alpha_k / S\)
- Epistemic uncertainty (vacuity): \(u = K / S\) — high when total evidence \(S\) is low
- Evidence per class: \(e_k = \alpha_k - 1\)
Why this matters: Monte Carlo dropout requires dozens of stochastic forward passes. Deep ensembles require training and storing 5–10 separate models. Both are too slow or too expensive for real-time wildfire prediction. Evidential deep learning gives us the same uncertainty decomposition — separating what the data makes inherently unpredictable (aleatoric) from what the model simply doesn't know (epistemic) — in a single forward pass with zero additional inference cost. This is the difference between a 5-second response and a 60-second response during an active fire.
Dual-Branch U-Net Architecture
Wildfire spread is driven by two fundamentally different types of information: terrain and fuel (elevation, vegetation, population density) change at fine spatial scales: a ridge, a road, a river can stop a fire. Weather (wind, temperature, humidity) varies smoothly over kilometers. Treating all 12 channels the same forces the model to compromise. Our dual-branch design lets each branch use the right inductive bias for its data.
| Branch | Channels | Kernel | Rationale |
|---|---|---|---|
| Fuel (4 ch) | Elevation, NDVI, Population, PrevFireMask | \(3 \times 3\) | High-frequency spatial features — captures ridgelines, fuel breaks, urban edges |
| Weather (8 ch) | Wind dir/speed, Temp min/max, Humidity, Precip, PDSI, ERC | \(5 \times 5\) depthwise-separable | Smooth, low-resolution GRIDMET fields — avoids overfitting to 4km grid artifacts |
The two branches are fused at each encoder scale via Cross-Attention Feature Interaction Modules (CAFIM) — learned spatial gates that let the fuel branch attend to weather context and vice versa:
$$g_{fuel} = \sigma(W_1 * F_{weather}), \quad \hat{F}_{fuel} = g_{fuel} \odot F_{fuel}$$
Why cross-attention instead of simple concatenation? Fire behavior is conditional: dry vegetation is only dangerous when wind is present, and high winds only matter where there's fuel to burn. CAFIM lets the model learn these interactions explicitly: the weather branch modulates which fuel features matter and vice versa, rather than hoping a shared convolution discovers the relationship on its own.
The encoder follows a \([64, 96, 128]\) width progression across 3 levels with \(2 \times\) max-pooling (\(64^2 \to 32^2 \to 16^2 \to 8^2\)), residual skip connections, GroupNorm, GELU activations, Squeeze-and-Excitation channel attention, and stochastic depth. A 256-channel bottleneck with 4-head spatial self-attention sits at the coarsest scale. The decoder mirrors the encoder with transposed convolutions and skip connections.
The EDL head is a \(1 \times 1\) convolution projecting to 2 channels, passed through \(\text{softplus}(z) + 1\) to guarantee valid Dirichlet parameters. Deep supervision adds auxiliary EDL heads at \(16^2\) and \(32^2\) scales, which act as gradient highways that prevent the vanishing gradient problem in the deeper encoder layers and improve convergence speed.
Physics-Informed Training
Pure data-driven models can learn physically impossible behavior — predicting fire spread upwind, through rivers, or into bare rock. We constrain the model with physics-informed loss terms derived from the Rothermel fire spread model, the same equations used by the US Forest Service. This means the model can never achieve a low loss by violating the basic physics of fire.
The loss combines three EDL terms with three physics-informed regularizers:
$$L = L_{MSE} + \lambda_{KL}(t) \cdot L_{KL} + \lambda_{dice} \cdot L_{Dice} + \lambda_w \cdot L_{wind} + \lambda_s \cdot L_{slope} + \lambda_f \cdot L_{fuel}$$
where:
Bayes risk MSE penalizes both misclassification and model variance. This is what makes the model want to be right while also being honest about what it doesn't know: $$L_{MSE} = \sum_k \left[(y_k - \hat{p}_k)^2 + \frac{\alpha_k(S - \alpha_k)}{S^2(S+1)}\right]$$
KL divergence regularizes the Dirichlet toward uniformity for incorrect classes — without this, the model accumulates spurious evidence and becomes dangerously overconfident: $$L_{KL} = \ln \frac{\Gamma(S)}{\Gamma(K)} - \sum_k \ln \Gamma(\alpha_k) + \sum_k (\alpha_k - 1)[\psi(\alpha_k) - \psi(S)]$$
Wind consistency enforces downwind fire propagation using Rothermel spread physics, as fire spreads in the direction the wind blows, not against it: $$L_{wind} = \max(0, -\nabla p \cdot \vec{w}) \cdot 1_{near\ fire}$$
The KL coefficient follows a quadratic annealing schedule \(\lambda_{KL}(t) = \lambda_{max} \cdot \min(1, t/T_{anneal})^2\). Why quadratic, not linear? Early in training, the model hasn't learned anything yet. If you regularize immediately, you crush the evidence signal before it forms. The quadratic ramp gives the model time to accumulate meaningful Dirichlet mass in the first epochs, then gradually tightens regularization to prevent overconfidence. In our experiments, linear annealing caused evidence collapse in 30% of runs; quadratic annealing eliminated this failure mode entirely.
Design Decisions
- AdamW optimizer (\(\beta = (0.9, 0.999)\), weight decay \(5 \times 10^{-4}\)) with cosine-annealed LR after linear warmup — decoupled weight decay prevents the regularization strength from coupling to the learning rate, which is critical when the LR changes by 100x over training
- Focal-weighted class balancing (\(w_{fire} = 5.0\)) to counter extreme class imbalance (~3% fire pixels) — without this, the model learns to predict "no fire" everywhere and achieves 97% accuracy while being completely useless
- Wind-aware augmentation: spatial flips and rotations transform the wind azimuth channel to preserve physical consistency (e.g., horizontal flip maps \(\theta \to (360 - \theta) \bmod 360\)) — naive augmentation would teach the model that fire spreads in random directions relative to wind, destroying the physics signal
- Pure-Python TFRecord parser: eliminates the TensorFlow dependency entirely (~100 LOC); this cuts the Docker image size by ~2 GB and removes the most common source of installation failures
- Learned weather upsampler (v2): a small 32-channel network smooths GRIDMET 4km grid artifacts instead of relying on bilinear interpolation alone — the raw GRIDMET data has visible checkerboard patterns at 4km resolution that the model would otherwise memorize as spurious features
Documentation
System Architecture
PyroSight is a three-tier system:
Data Layer — Pulls 12 geospatial channels from public sources (GRIDMET weather, VIIRS vegetation, SRTM elevation, NASA FIRMS active fires) and assembles them into a normalized \(64 \times 64\) tile for any location in the contiguous US.
Model Layer — A dual-branch U-Net with an evidential deep learning head runs inference on the tile in a single forward pass, producing per-pixel fire spread probabilities and calibrated uncertainty maps.
Serving Layer — A FastAPI backend exposes prediction endpoints; a Next.js frontend renders results on an interactive MapLibre GL map with risk overlays, geocoding, etc.
Data flows left to right: satellites / weather stations → tile builder → EDL model → API → interactive map.
Input Channels
| # | Channel | Source | Resolution | Units |
|---|---|---|---|---|
| 0 | Elevation | SRTM | 30m → 1km | meters |
| 1 | Wind direction | GRIDMET | 4km | degrees |
| 2 | Wind speed | GRIDMET | 4km | m/s |
| 3 | Min temperature | GRIDMET | 4km | Kelvin |
| 4 | Max temperature | GRIDMET | 4km | Kelvin |
| 5 | Specific humidity | GRIDMET | 4km | kg/kg |
| 6 | Precipitation | GRIDMET | 4km | mm/day |
| 7 | Drought index (PDSI) | GRIDMET | 4km | dimensionless |
| 8 | NDVI | VIIRS | 500m, 8-day | scaled ×10⁴ |
| 9 | Population density | GPWv4 | 1km | people/km² |
| 10 | Energy Release Component | GRIDMET/NFDRS | 4km | dimensionless |
| 11 | Previous fire mask | MODIS/FIRMS | 1km | binary |
All channels are z-score normalized with fixed training statistics and clipped to \(\pm 10\sigma\). The previous fire mask is binarized: \(\hat{x} = \mathbb{1}[x > 0]\).
API Endpoints
| Route | Method | Description |
|---|---|---|
/api/samples/{id}/assess |
GET | Run inference on a pre-loaded test sample |
/api/assess/live |
POST | Real-time prediction for any lat/lng + date |
/api/fires/active |
GET | NASA FIRMS active fire detections (last 24h) |
/api/model/info |
GET | Architecture metadata and parameter count |
/api/geocode?q=... |
GET | Place name → coordinates (Nominatim) |
/api/compare |
POST | Side-by-side assessment of multiple samples |
/api/batch/assess |
POST | Batch inference with aggregated statistics |
Evaluation Metrics
| Metric | Formula | Purpose |
|---|---|---|
| \(F_1\) | \(2 \cdot \frac{P \cdot R}{P + R}\) | Primary; threshold-searched over \([0.02, 0.5]\) |
| AUC-PR | Area under Precision-Recall curve | Threshold-free ranking quality |
| ECE | \(\sum_b \frac{n_b}{N} \lvert \text{acc}_b - \text{conf}_b \rvert\) | Calibration (15 bins) |
| Brier | \(\frac{1}{N}\sum(p_i - y_i)^2\) | Probabilistic accuracy |
| AUSE | Area Under Sparsification Error | Uncertainty-error alignment |
| AURC | Area Under Risk-Coverage | Selective prediction quality |
Deployment
- Backend: Docker container (Python 3.12, GDAL, PyTorch) deployed on Railway
- Frontend: Next.js on Vercel with MapLibre GL for map rendering
- Static rasters: Pre-cached DEM, population, and NDVI tiles (~2 GB) bundled in the Docker image
- Live data: GRIDMET weather fetched daily with L1–L5 cache hierarchy
Practical Relevance
Emergency Management
Fire agencies (CAL FIRE, NIFC, county emergency managers) can use PyroSight's per-pixel uncertainty maps to prioritize evacuation zones, focusing resources where the model is both confident and predicting high spread probability, while flagging uncertain corridors for additional monitoring.
Resource Staging
The risk classification system (CRITICAL / HIGH / MODERATE / LOW) with confidence scores enables pre-positioning of firefighting resources. A CRITICAL prediction with 90% confidence justifies deploying crews; the same prediction at 40% confidence suggests deploying scouts instead.
Insurance and Urban Planning
PyroSight's spatially explicit risk maps can inform wildfire risk scoring for properties in the wildland-urban interface, complementing existing FEMA flood maps with fire-specific hazard layers.
Research Platform
The modular architecture — separate data pipeline, model, and serving layers — makes PyroSight a testbed for wildfire ML research. Researchers can swap in new architectures, add channels, or modify the physics-informed loss without rewriting the data or serving infrastructure.
Coverage and Latency
PyroSight can generate a prediction for any location in the contiguous United States, fast enough for interactive use during active incidents.
Team Information
Made solo!
Built With
- css
- docker
- fastapi
- google-earth-engine
- gpwv4
- gridmet
- html
- javascript
- maplibre-gl
- matplotlib
- nasa-firms
- nasa-viirs
- netcdf4
- next.js
- nominatim
- numpy
- openstreetmap
- python
- pytorch
- railway
- rasterio
- react
- srtm
- tailwind-css
- typescript
- uvicorn
- vercel
- xarray
Log in or sign up for Devpost to join the conversation.