BightWatch — Project Story

Inspiration

California spends a multi-million-dollar-per-year recurring budget monitoring 124 marine protected areas. The state's own 2023 Decadal Management Review said that effort is allocated roughly evenly across the network instead of being concentrated on the reserves under the most compounding stress. That is a real, named, public gap in a system that is otherwise well-funded — and nobody is sitting at the allocation-decision layer to fix it. C-HARM forecasts harmful algal blooms. HABMAP runs pier sensors. The CCIEA writes annual retrospectives. None of them rank California's monitoring zones by stress and tell a Senior Environmental Scientist where to send the next quarter's ship time. That wedge — between the data layer and the decision layer — is BightWatch.

Personally: we kept reading peer-reviewed papers (Cavole 2016, Thompson 2022, Jacox 2018) that said the 2014–2016 "Blob" marine heatwave was a turning point for the California Current — and then looking at how MPA monitoring is allocated and seeing no evidence that the lesson got operationalized. We wanted to build the artifact that makes the lesson actionable.

What it does

BightWatch is a quarterly priority ranking for the 13 monitoring zones spanning CalCOFI Lines 80–93.3 in the Southern California Bight. For each zone it produces:

  • A 0–100 conservation-priority score
  • A recommended action band (investigate_now, schedule_next_cruise, add_plankton_tow, add_HAB_sample, deploy_sensor, monitor_only)
  • The top-3 feature drivers with direction-of-effect
  • An A–F confidence badge
  • An auditable trail back to a CSV cell for every number

The product is a 6-view Streamlit app: Overview, Risk Map, Zone Detail, Priority Queue, Briefings, and Ask BightWatch — a Gemini-powered chatbot that answers free-form questions over the same data without being able to invent a number.

How we built it

Stack. Python · pandas · DuckDB · GeoPandas · Streamlit · Folium · Google Gemini 2.5 (Flash + Flash-Lite). One-way data flow:

data/raw/ (gitignored) → data/parquet/ → data/staged/ → outputs/, figures/

The data fusion

We fused three things California already collects:

  1. 72 years of CalCOFI hydrographic + ichthyoplankton observations curated by Scripps (NMFS + Scripps + CDFW partnership) — 3,096 cruise-months
  2. CDFW MPA polygon set — boundary geometry for the South Coast Study Region (50 MPAs + 2 Special Closures)
  3. Port-proximity fishing-pressure proxy — distance-decayed extractive-pressure score per zone

…through an 8-script pipeline:

download → parquet → sample → tolerant-compound-join
       → zones → features → MPA overlay → fishing pressure

The hard part: the join

CalCOFI has no UUID linking the Bottle hydrographic database to the ichthyoplankton tow records. They were collected on the same cruises but published as separate products. We had to invent a tolerant compound join keyed on:

[ \text{cruise} \;\wedge\; |\Delta\text{line}| \le 0.1 \;\wedge\; |\Delta\text{station}| \le 0.5 \;\wedge\; |\Delta t| \le 4\text{h} ]

Two-stage: first attach cast.date+time to bottle rows via Cst_Cnt, then match each enriched bottle row to a larval tow within the tolerance box. We achieved 94.2% join yield on attached tows, 97.0% post-1984 (when CalCOFI gear became consistent).

The risk model

For each (zone, cruise-period) we compute depth-aggregated features (upper-100 m temperature anomaly, oxygen, chlorophyll integral, salinity, BEUTI lag), feed them through a robust-linear baseline against the larval-anomaly z-score response variable, and combine the bio-risk output with MPA-overlap and fishing-pressure layers into a single conservation priority score:

[ \text{Priority}_z = \alpha \cdot \text{BioRisk}_z + \beta \cdot \text{MPAOverlap}_z + \gamma \cdot \text{FishingPressure}_z ]

with weights derived from the rubric in outputs/thread_b/confidence_badges.csv and ablation results in figures/thread_b/.

Gemini integration — the part we're proud of

The "Ask BightWatch" chatbot uses Gemini as a tool-use orchestrator, not a paraphraser of raw data. The model gets zero numbers without a tool call. Architecture:

  1. User question → ask_gemini sends conversation + 6 FunctionDeclarations + a system instruction (5 hard rules)
  2. Model picks one of 6 Python tools: list_zones, get_zone_summary, get_feature_timeseries, compare_zones, find_analogs, define_term
  3. We execute the tool locally; structured dict goes back as a function_response part
  4. Loop up to 4 iterations
  5. Final prose passes through guardrail_check_tool_output — a multi-tool numeric-citation guardrail that flattens every number across every tool result and confirms every number in the answer is in the allowlist
  6. FAIL → suppress prose, render raw tool JSON instead. No silent relaxation.

We added an auto-fallback chain as demo-day insurance: when gemini-2.5-flash (250 RPD free) hits its quota, the chatbot silently retries on gemini-2.5-flash-lite (1000 RPD) without the user noticing. Three degradation modes also ship: no API key, no SDK installed, API error — each returns a useful local-tool answer with a deterministic prose template (no hallucinated fallback).

Validation

Five process gates, all passed:

Gate Check Status
G1 Source tie-out (pandas vs. DuckDB on bottle, larvae, BEUTI) PASS
G2 Join yield ≥ 70% post-1984 97.0%
G3 Simpson's-paradox check (warming holds at zone × decade) PASS
G4 2014–2016 MHW back-test (primary evidence) B confidence
G5 AI-brief numeric-citation guardrail 10/10 unit tests pass

Plus 37 chatbot tests + 31 guardrail tests, including 5 adversarial fixtures that try to make Gemini invent temperatures, predict 2030, or quote dollar figures. Every one is caught.

What we learned

  • The hardest data work is the join you didn't know existed. No paper warned us CalCOFI's hydrography and ichthyoplankton would lack a UUID. Discovering this at hour 6 and engineering a tolerant compound join with a documented tolerance box was the single biggest unlock.
  • Tool-use changes what you can claim about an LLM. A model that cannot see the parquets but can call typed tools, combined with a multi-tool numeric-citation guardrail, lets us honestly say "the chatbot cannot lie about a statistic." That's a different product than "RAG-grounded chat."
  • @st.cache_data on every loader is non-negotiable. Without it, Streamlit re-reads every parquet on every interaction and the demo dies in front of a judge.
  • Honesty is positioning. The earlier draft of our pitch said "BightWatch is not a recommendation engine." That was technically wrong — a ranked queue with action labels is a recommendation. Correcting it to "we recommend monitoring effort, not regulatory action" was both more accurate and a better wedge against C-HARM.
  • Geospatial CRS matters. Everything spatial reprojects to EPSG:32611 (UTM 11N) before distance/area math. The SCB sits cleanly inside this zone; mixing CRSs breaks fishing-pressure proxies in subtle ways that pass null-checks but fail sanity-checks.

Challenges we ran into

  • The 1967–1983 ichthyoplankton gap. CalCOFI ran triennially in that window. Our modern baseline starts at 1984.
  • The 1977 ring → bongo gear transition. Pre-1977 abundance numbers are not directly comparable; we filter unless applying Thompson 2017 corrections.
  • Bottle DB cuts off May 2021. Anything more recent rides on the OISST + BEUTI + HABMAP "bridge layer" — we list this on the honesty slide instead of hiding it.
  • Chl-a coverage: ~0% null pre-1980, ≥95% post-1990. We use coverage-aware masks pre-1990.
  • Free-tier Gemini quotas (250 requests/day on flash). A judge mashing the chatbot for 5 minutes can plausibly burn 50+ requests because each user turn = 1–4 model calls (the tool loop). Solved with the auto-fallback chain to flash-lite (1000 RPD).
  • Streamlit cross-view selection state. streamlit-folium ≥ 0.20 is required for click events; we route everything through one st.session_state.selected_zone_id key so map clicks, Priority Queue row selects, and AI-Brief reorder all stay in sync without a JSON intermediate.
  • Twenty hours. Four parallel work threads (data integrity, analytics, product surface, narrative). PLAN.md was the source of truth for task status; every completed task got a ✅ in three places (phase section, dependency graph, quick-reference index) and shipped a 7-section writeup in docs/.

Built with

Python · pandas · numpy · DuckDB · GeoPandas · Shapely · pyarrow · Streamlit · Folium · streamlit-folium · matplotlib · Google Gemini 2.5 (Flash + Flash-Lite) · google-genai SDK · Scripps CalCOFI (Bottle DB + EDI edi.109.4 ichthyoplankton + Station Order) · NOAA OISST · NOAA BEUTI · SCCOOS HABMAP · CDFW MPA polygons

Try it yourself

git clone https://github.com/blue-octopus235/bightwatch
cd bightwatch
pip install -r requirements.txt
streamlit run app/main.py

Add GEMINI_API_KEY=... to a .env file at the repo root to enable the chatbot. The staged parquets and Thread-B CSVs ride in the repo, so the 4-command happy path works without re-downloading raw data.

Built With

Share this project:

Updates