Inspiration

BloomCast started from a simple frustration: HAB forecast tools usually surface a risk score with no context. A forecast that names specific drivers with specific numbers — and admits what it can't see — is more actionable than an opaque probability, even if the underlying model is simple.

What it does

BloomCast maps twelve simulated monitoring buoys across the US coastline and generates fourteen days of water-column readings (temperature, salinity, chlorophyll-a, nitrate, dissolved oxygen) from a deterministic seed. A five-feature logistic regression produces a 0–100 bloom risk index, projected seven days forward using a logistic-growth ODE. Claude writes a three-paragraph field note for each forecast, naming the dominant drivers in concrete units and flagging what the model can't see. Every forecast logs to SQLite; a history page plots predicted vs. simulated observed outcomes.

How we built it

SvelteKit + strict TypeScript, D3 for all charts (hand-rolled, no wrappers), Pico CSS classless, better-sqlite3 for persistence, and @anthropic-ai/sdk for the explanation layer. Simulation, explanation, and persistence are cleanly separated so the templated fallback drops in seamlessly when no API key is present.

Challenges we ran into

Two main ones: making the seeded data generator produce ecologically plausible readings (not just reproducible ones), which required station-specific priors for temperature, salinity, and nutrient baselines. And prompting Claude to write field notes that were specific without sounding calibrated — early drafts read like confident research summaries; the system prompt had to explicitly demand hedged language and named unknowns.

Accomplishments that we're proud of

The explanation layer reads like something a knowledgeable person wrote about this buoy on this day — it cites a chlorophyll-a value, compares it to a threshold, and flags that the model has no eyes on wind advection or river discharge. Making the model's limitations first-class content, not a footnote, felt like the right call.

What we learned

Simple models explained well beat complex models explained poorly. The logistic regression here is barely a model — five hand-set coefficients, no training data — but because every output traces back to a named variable with a real number, it feels more trustworthy than an opaque ensemble score. "Show your work" turned out to be a design constraint: it forced the feature set small enough that the explanation could actually enumerate the drivers.

What's next for BloomCast

Replace simulated data with real ingestion (NOAA NCCOS, Chesapeake Bay Program, USGS discharge feeds) so the reliability scatter means something and the coefficients can be learned. Add a species layer distinguishing cyanobacteria from dinoflagellate blooms. Surface data-provenance gaps in the field note itself — if a buoy is stale or a gauge is offline, the forecast should say so.

Built With

Share this project:

Updates