BightWatch — Project Story
Inspiration
California spends a multi-million-dollar-per-year recurring budget monitoring 124 marine protected areas. The state's own 2023 Decadal Management Review said that effort is allocated roughly evenly across the network instead of being concentrated on the reserves under the most compounding stress. That is a real, named, public gap in a system that is otherwise well-funded — and nobody is sitting at the allocation-decision layer to fix it. C-HARM forecasts harmful algal blooms. HABMAP runs pier sensors. The CCIEA writes annual retrospectives. None of them rank California's monitoring zones by stress and tell a Senior Environmental Scientist where to send the next quarter's ship time. That wedge — between the data layer and the decision layer — is BightWatch.
Personally: we kept reading peer-reviewed papers (Cavole 2016, Thompson 2022, Jacox 2018) that said the 2014–2016 "Blob" marine heatwave was a turning point for the California Current — and then looking at how MPA monitoring is allocated and seeing no evidence that the lesson got operationalized. We wanted to build the artifact that makes the lesson actionable.
What it does
BightWatch is a quarterly priority ranking for the 13 monitoring zones spanning CalCOFI Lines 80–93.3 in the Southern California Bight. For each zone it produces:
- A 0–100 conservation-priority score
- A recommended action band (
investigate_now,schedule_next_cruise,add_plankton_tow,add_HAB_sample,deploy_sensor,monitor_only) - The top-3 feature drivers with direction-of-effect
- An A–F confidence badge
- An auditable trail back to a CSV cell for every number
The product is a 6-view Streamlit app: Overview, Risk Map, Zone Detail, Priority Queue, Briefings, and Ask BightWatch — a Gemini-powered chatbot that answers free-form questions over the same data without being able to invent a number.
How we built it
Stack. Python · pandas · DuckDB · GeoPandas · Streamlit · Folium · Google Gemini 2.5 (Flash + Flash-Lite). One-way data flow:
data/raw/ (gitignored) → data/parquet/ → data/staged/ → outputs/, figures/
The data fusion
We fused three things California already collects:
- 72 years of CalCOFI hydrographic + ichthyoplankton observations curated by Scripps (NMFS + Scripps + CDFW partnership) — 3,096 cruise-months
- CDFW MPA polygon set — boundary geometry for the South Coast Study Region (50 MPAs + 2 Special Closures)
- Port-proximity fishing-pressure proxy — distance-decayed extractive-pressure score per zone
…through an 8-script pipeline:
download → parquet → sample → tolerant-compound-join
→ zones → features → MPA overlay → fishing pressure
The hard part: the join
CalCOFI has no UUID linking the Bottle hydrographic database to the ichthyoplankton tow records. They were collected on the same cruises but published as separate products. We had to invent a tolerant compound join keyed on:
[ \text{cruise} \;\wedge\; |\Delta\text{line}| \le 0.1 \;\wedge\; |\Delta\text{station}| \le 0.5 \;\wedge\; |\Delta t| \le 4\text{h} ]
Two-stage: first attach cast.date+time to bottle rows via Cst_Cnt, then match each enriched bottle row to a larval tow within the tolerance box. We achieved 94.2% join yield on attached tows, 97.0% post-1984 (when CalCOFI gear became consistent).
The risk model
For each (zone, cruise-period) we compute depth-aggregated features (upper-100 m temperature anomaly, oxygen, chlorophyll integral, salinity, BEUTI lag), feed them through a robust-linear baseline against the larval-anomaly z-score response variable, and combine the bio-risk output with MPA-overlap and fishing-pressure layers into a single conservation priority score:
[ \text{Priority}_z = \alpha \cdot \text{BioRisk}_z + \beta \cdot \text{MPAOverlap}_z + \gamma \cdot \text{FishingPressure}_z ]
with weights derived from the rubric in outputs/thread_b/confidence_badges.csv and ablation results in figures/thread_b/.
Gemini integration — the part we're proud of
The "Ask BightWatch" chatbot uses Gemini as a tool-use orchestrator, not a paraphraser of raw data. The model gets zero numbers without a tool call. Architecture:
- User question →
ask_geminisends conversation + 6FunctionDeclarations + a system instruction (5 hard rules) - Model picks one of 6 Python tools:
list_zones,get_zone_summary,get_feature_timeseries,compare_zones,find_analogs,define_term - We execute the tool locally; structured
dictgoes back as afunction_responsepart - Loop up to 4 iterations
- Final prose passes through
guardrail_check_tool_output— a multi-tool numeric-citation guardrail that flattens every number across every tool result and confirms every number in the answer is in the allowlist - FAIL → suppress prose, render raw tool JSON instead. No silent relaxation.
We added an auto-fallback chain as demo-day insurance: when gemini-2.5-flash (250 RPD free) hits its quota, the chatbot silently retries on gemini-2.5-flash-lite (1000 RPD) without the user noticing. Three degradation modes also ship: no API key, no SDK installed, API error — each returns a useful local-tool answer with a deterministic prose template (no hallucinated fallback).
Validation
Five process gates, all passed:
| Gate | Check | Status |
|---|---|---|
| G1 | Source tie-out (pandas vs. DuckDB on bottle, larvae, BEUTI) | PASS |
| G2 | Join yield ≥ 70% post-1984 | 97.0% |
| G3 | Simpson's-paradox check (warming holds at zone × decade) | PASS |
| G4 | 2014–2016 MHW back-test (primary evidence) | B confidence |
| G5 | AI-brief numeric-citation guardrail | 10/10 unit tests pass |
Plus 37 chatbot tests + 31 guardrail tests, including 5 adversarial fixtures that try to make Gemini invent temperatures, predict 2030, or quote dollar figures. Every one is caught.
What we learned
- The hardest data work is the join you didn't know existed. No paper warned us CalCOFI's hydrography and ichthyoplankton would lack a UUID. Discovering this at hour 6 and engineering a tolerant compound join with a documented tolerance box was the single biggest unlock.
- Tool-use changes what you can claim about an LLM. A model that cannot see the parquets but can call typed tools, combined with a multi-tool numeric-citation guardrail, lets us honestly say "the chatbot cannot lie about a statistic." That's a different product than "RAG-grounded chat."
@st.cache_dataon every loader is non-negotiable. Without it, Streamlit re-reads every parquet on every interaction and the demo dies in front of a judge.- Honesty is positioning. The earlier draft of our pitch said "BightWatch is not a recommendation engine." That was technically wrong — a ranked queue with action labels is a recommendation. Correcting it to "we recommend monitoring effort, not regulatory action" was both more accurate and a better wedge against C-HARM.
- Geospatial CRS matters. Everything spatial reprojects to EPSG:32611 (UTM 11N) before distance/area math. The SCB sits cleanly inside this zone; mixing CRSs breaks fishing-pressure proxies in subtle ways that pass null-checks but fail sanity-checks.
Challenges we ran into
- The 1967–1983 ichthyoplankton gap. CalCOFI ran triennially in that window. Our modern baseline starts at 1984.
- The 1977 ring → bongo gear transition. Pre-1977 abundance numbers are not directly comparable; we filter unless applying Thompson 2017 corrections.
- Bottle DB cuts off May 2021. Anything more recent rides on the OISST + BEUTI + HABMAP "bridge layer" — we list this on the honesty slide instead of hiding it.
- Chl-a coverage: ~0% null pre-1980, ≥95% post-1990. We use coverage-aware masks pre-1990.
- Free-tier Gemini quotas (250 requests/day on flash). A judge mashing the chatbot for 5 minutes can plausibly burn 50+ requests because each user turn = 1–4 model calls (the tool loop). Solved with the auto-fallback chain to flash-lite (1000 RPD).
- Streamlit cross-view selection state.
streamlit-folium ≥ 0.20is required for click events; we route everything through onest.session_state.selected_zone_idkey so map clicks, Priority Queue row selects, and AI-Brief reorder all stay in sync without a JSON intermediate. - Twenty hours. Four parallel work threads (data integrity, analytics, product surface, narrative). PLAN.md was the source of truth for task status; every completed task got a ✅ in three places (phase section, dependency graph, quick-reference index) and shipped a 7-section writeup in
docs/.
Built with
Python · pandas · numpy · DuckDB · GeoPandas · Shapely · pyarrow · Streamlit · Folium · streamlit-folium · matplotlib · Google Gemini 2.5 (Flash + Flash-Lite) · google-genai SDK · Scripps CalCOFI (Bottle DB + EDI edi.109.4 ichthyoplankton + Station Order) · NOAA OISST · NOAA BEUTI · SCCOOS HABMAP · CDFW MPA polygons
Try it yourself
git clone https://github.com/blue-octopus235/bightwatch
cd bightwatch
pip install -r requirements.txt
streamlit run app/main.py
Add GEMINI_API_KEY=... to a .env file at the repo root to enable the chatbot. The staged parquets and Thread-B CSVs ride in the repo, so the 4-command happy path works without re-downloading raw data.
Log in or sign up for Devpost to join the conversation.