Inspiration

For fishermen, one of the biggest challenges is uncertainty. Knowing where to go, when to go, and what to target often comes down to guesswork, costing time and fuel.

At the same time, ocean ecosystems follow patterns. CalCOFI datasets capture both fish larvae abundance (early signals of future populations) and real catch data (what fishermen are actually catching).

This led to a simple idea: What if we could connect these two to help fishermen plan smarter trips ahead of time?

SpawnCast does exactly that. It uses larvae data to predict future fish presence and catch data to ground those predictions in reality, turning complex ocean data into practical, actionable fishing decisions.

What it does

SpawnCast is an AI-powered decision tool built specifically for fishermen, helping them answer three critical questions:

Where should I go? When should I go? What should I target?

Core functionality:

  1. Interactive Map of Fishing Hotspots
  2. Displays predicted high-yield regions along the coast
  3. Each point includes: Target species, Confidence score, Expected yield level
  4. Built using spatial grid regions derived from latitude/longitude data

  5. Time-Aware AI Predictions

  6. Combines larvae abundance data (early biological signals) with historical catch data

  7. Uses time-lag modeling to predict future fish presence

  8. Ranks the best fishing opportunities based on confidence and expected yield

  9. Ranked Fishing Insights

  10. Filters results based on user-selected location and date range

  11. Outputs a ranked list of the top species to target. Includes: best fishing window, confidence score, yield level

  12. AI-Generated Explanations

  13. Uses Gemini to explain why a species is recommended

  14. Translates model output into human-understandable reasoning

  15. Helps fishermen trust and interpret predictions

  16. Trip Planner + Calendar Integration

  17. Automatically selects the optimal fishing day within a given range

  18. Generates a calendar event for the trip

  19. Turns predictions into real, schedulable plans

How we built it

Data Sources: We used two datasets from NOAA’s ERDDAP CalCOFI collection:

  1. Larvae dataset
  2. Contains species-level larvae abundance across regions and time
  3. Acts as a leading indicator of future fish populations

  4. Catch dataset

  5. Contains actual recorded fish catches

  6. Grounds the model in real fishing outcomes

Dataset Alignment:

The two datasets were fundamentally different:

  • Different schemas (column names, formats)
  • Different spatial resolutions (raw coordinates vs catch locations)
  • Different temporal formats (timestamps vs structured dates)

To make them usable together, we: Standardized time:

  • Converted timestamps into a common monthly format Created spatial regions:
  • Cleaned data to align regions from both data sets (West Coast)
  • Binned latitude/longitude into grid cells
  • Generated a shared region_id for both datasets Aggregated data:
  • Computed average larvae density per region and time
  • Computed average catch levels Cleaned:
  • Identified endangered species
  • Removed endangered species from data to avoid fatal recommendations

This alignment allowed us to combine biological signals with real catch outcomes in a consistent framework.

Machine Learning Pipeline:

ML pipeline using Databricks + PySpark + scikit-learn.

Applied a time-lag transformation (~4 weeks): larvae (past) → fish catch (future)

  • Trained a Random Forest model to predict high-yield fishing conditions
  • Generated structured predictions for each region, species, and time window

The output is stored in a file containing:

  • location (lat/lon)
  • species
  • confidence scores
  • yield levels
  • seasonal signals

This acts as the core intelligence layer of the entire system.

The Random Forest model:

  • handles nonlinear relationships between biological and spatial signals
  • outputs probability-style confidence scores
  • allows us to classify fishing opportunities into yield levels (e.g., high, medium, low)

Prediction Layer

Model outputs are written to the intelligence.json file containing structured predictions for each region, species, and time window.

Backend (FastAPI)

We built a FastAPI backend that serves predictions from intelligence.json.

Instead of retraining models at runtime:

the backend loads precomputed predictions applies filtering, ranking, and geospatial logic

Key responsibilities:

Trip Planner ranking ranks species based on confidence and yield for a given location + date Geospatial matching uses distance calculations to map user input to nearest grid regions Insights generation aggregates species predictions and determines best fishing windows Calendar generation creates .ics events for scheduling trips

AI Layer (Gemini)

We integrated Gemini 2.5 Flash to generate:

natural-language explanations for predictions concise reasoning behind species recommendations

Gemini enhances interpretability but does not affect model rankings—it strictly explains them.

Frontend (React + Mapbox)

We built a responsive frontend using React, TypeScript, Vite, and Tailwind CSS.

Key features:

Interactive Map (Mapbox) visualizes predicted fishing hotspots spatially Trip Planner takes user input (location + date range) returns ranked fishing recommendations Region Detail Panel displays species, confidence, and yield for selected areas Calendar Export allows users to schedule fishing trips directly

APIs

Our backend exposes endpoints that connect the ML system to the frontend:

/planner/recommendations → ranked species predictions /insights → aggregated species insights /region-details → data for selected map points /calendar → downloadable fishing trip events

Challenges we ran into

  1. Time modeling complexity

Initially, our model ignored time, leading to static predictions. We had to redesign the pipeline to incorporate biological lag, which required rethinking how we aligned datasets.

  1. Dataset alignment

The larvae and catch datasets had:

  • different schemas
  • different time formats
  • different spatial structures

We solved this by:

  • creating region grids
  • standardizing timestamps
  • aggregating both datasets to a shared structure
  1. API + deployment issues

We encountered:

  • environment conflicts
  • Git merge issues

We resolved these with:

  • .env security handling
  • proper Git workflows
  • caching + fallback logic

Accomplishments that we're proud of

  • Successfully integrated two complex oceanographic datasets
  • Built a time-aware ML model grounded in biology
  • Created a full-stack system (ML + backend + UI)
  • Delivered actionable outputs (calendar scheduling), not just predictions
  • Added AI explanations to improve usability and transparency

What we learned

  • Biological systems require temporal thinking, not just spatial modeling
  • Data alignment is often harder than modeling itself
  • The most impactful systems are those that bridge prediction → action
  • AI explanations significantly improve user trust and interpretability

What's next for SpawnCast

  • Incorporate real-time ocean conditions (temperature, currents, weather)
  • Incorporate more offline features (downloadable Google maps integration that shows routes to the recommended region)
  • Improve time-series modeling (e.g., LSTMs or temporal models)
  • Expand beyond California to global fisheries
  • Personalization based on user preferences and fishing styles
  • Build a full mobile experience for real-world usability

Built With

Share this project:

Updates