SpawnCast

Inspiration

For fishermen, one of the biggest challenges is uncertainty. Knowing where to go, when to go, and what to target often comes down to guesswork, costing time and fuel.

At the same time, ocean ecosystems follow patterns. CalCOFI datasets capture both fish larvae abundance (early signals of future populations) and real catch data (what fishermen are actually catching).

This led to a simple idea: What if we could connect these two to help fishermen plan smarter trips ahead of time?

SpawnCast does exactly that. It uses larvae data to predict future fish presence and catch data to ground those predictions in reality, turning complex ocean data into practical, actionable fishing decisions.

What it does

SpawnCast is an AI-powered decision tool built specifically for fishermen, helping them answer three critical questions:

Where should I go? When should I go? What should I target?

Core functionality:

Interactive Map of Fishing Hotspots
Displays predicted high-yield regions along the coast
Each point includes: Target species, Confidence score, Expected yield level
Built using spatial grid regions derived from latitude/longitude data
Time-Aware AI Predictions
Combines larvae abundance data (early biological signals) with historical catch data
Uses time-lag modeling to predict future fish presence
Ranks the best fishing opportunities based on confidence and expected yield
Ranked Fishing Insights
Filters results based on user-selected location and date range
Outputs a ranked list of the top species to target. Includes: best fishing window, confidence score, yield level
AI-Generated Explanations
Uses Gemini to explain why a species is recommended
Translates model output into human-understandable reasoning
Helps fishermen trust and interpret predictions
Trip Planner + Calendar Integration
Automatically selects the optimal fishing day within a given range
Generates a calendar event for the trip
Turns predictions into real, schedulable plans

How we built it

Data Sources: We used two datasets from NOAA’s ERDDAP CalCOFI collection:

Larvae dataset
Contains species-level larvae abundance across regions and time
Acts as a leading indicator of future fish populations
Catch dataset
Contains actual recorded fish catches
Grounds the model in real fishing outcomes

Dataset Alignment:

The two datasets were fundamentally different:

Different schemas (column names, formats)
Different spatial resolutions (raw coordinates vs catch locations)
Different temporal formats (timestamps vs structured dates)

To make them usable together, we: Standardized time:

Converted timestamps into a common monthly format Created spatial regions:
Cleaned data to align regions from both data sets (West Coast)
Binned latitude/longitude into grid cells
Generated a shared region_id for both datasets Aggregated data:
Computed average larvae density per region and time
Computed average catch levels Cleaned:
Identified endangered species
Removed endangered species from data to avoid fatal recommendations

This alignment allowed us to combine biological signals with real catch outcomes in a consistent framework.

Machine Learning Pipeline:

ML pipeline using Databricks + PySpark + scikit-learn.

Applied a time-lag transformation (~4 weeks): larvae (past) → fish catch (future)

Trained a Random Forest model to predict high-yield fishing conditions
Generated structured predictions for each region, species, and time window

The output is stored in a file containing:

location (lat/lon)
species
confidence scores
yield levels
seasonal signals

This acts as the core intelligence layer of the entire system.

The Random Forest model:

handles nonlinear relationships between biological and spatial signals
outputs probability-style confidence scores
allows us to classify fishing opportunities into yield levels (e.g., high, medium, low)

Prediction Layer

Model outputs are written to the intelligence.json file containing structured predictions for each region, species, and time window.

Backend (FastAPI)

We built a FastAPI backend that serves predictions from intelligence.json.

Instead of retraining models at runtime:

the backend loads precomputed predictions applies filtering, ranking, and geospatial logic

Key responsibilities:

Trip Planner ranking ranks species based on confidence and yield for a given location + date Geospatial matching uses distance calculations to map user input to nearest grid regions Insights generation aggregates species predictions and determines best fishing windows Calendar generation creates .ics events for scheduling trips

AI Layer (Gemini)

We integrated Gemini 2.5 Flash to generate:

natural-language explanations for predictions concise reasoning behind species recommendations

Gemini enhances interpretability but does not affect model rankings—it strictly explains them.

Frontend (React + Mapbox)

We built a responsive frontend using React, TypeScript, Vite, and Tailwind CSS.

Key features:

Interactive Map (Mapbox) visualizes predicted fishing hotspots spatially Trip Planner takes user input (location + date range) returns ranked fishing recommendations Region Detail Panel displays species, confidence, and yield for selected areas Calendar Export allows users to schedule fishing trips directly

APIs

Our backend exposes endpoints that connect the ML system to the frontend:

/planner/recommendations → ranked species predictions /insights → aggregated species insights /region-details → data for selected map points /calendar → downloadable fishing trip events

Challenges we ran into

Time modeling complexity

Initially, our model ignored time, leading to static predictions. We had to redesign the pipeline to incorporate biological lag, which required rethinking how we aligned datasets.

Dataset alignment

The larvae and catch datasets had:

different schemas
different time formats
different spatial structures

We solved this by:

creating region grids
standardizing timestamps
aggregating both datasets to a shared structure

API + deployment issues

We encountered:

environment conflicts
Git merge issues

We resolved these with:

.env security handling
proper Git workflows
caching + fallback logic

Accomplishments that we're proud of

Successfully integrated two complex oceanographic datasets
Built a time-aware ML model grounded in biology
Created a full-stack system (ML + backend + UI)
Delivered actionable outputs (calendar scheduling), not just predictions
Added AI explanations to improve usability and transparency

What we learned

Biological systems require temporal thinking, not just spatial modeling
Data alignment is often harder than modeling itself
The most impactful systems are those that bridge prediction → action
AI explanations significantly improve user trust and interpretability

What's next for SpawnCast

Incorporate real-time ocean conditions (temperature, currents, weather)
Incorporate more offline features (downloadable Google maps integration that shows routes to the recommended region)
Improve time-series modeling (e.g., LSTMs or temporal models)
Expand beyond California to global fisheries
Personalization based on user preferences and fishing styles
Build a full mobile experience for real-world usability