Inspiration
For fishermen, one of the biggest challenges is uncertainty. Knowing where to go, when to go, and what to target often comes down to guesswork, costing time and fuel.
At the same time, ocean ecosystems follow patterns. CalCOFI datasets capture both fish larvae abundance (early signals of future populations) and real catch data (what fishermen are actually catching).
This led to a simple idea: What if we could connect these two to help fishermen plan smarter trips ahead of time?
SpawnCast does exactly that. It uses larvae data to predict future fish presence and catch data to ground those predictions in reality, turning complex ocean data into practical, actionable fishing decisions.
What it does
SpawnCast is an AI-powered decision tool built specifically for fishermen, helping them answer three critical questions:
Where should I go? When should I go? What should I target?
Core functionality:
- Interactive Map of Fishing Hotspots
- Displays predicted high-yield regions along the coast
- Each point includes: Target species, Confidence score, Expected yield level
Built using spatial grid regions derived from latitude/longitude data
Time-Aware AI Predictions
Combines larvae abundance data (early biological signals) with historical catch data
Uses time-lag modeling to predict future fish presence
Ranks the best fishing opportunities based on confidence and expected yield
Ranked Fishing Insights
Filters results based on user-selected location and date range
Outputs a ranked list of the top species to target. Includes: best fishing window, confidence score, yield level
AI-Generated Explanations
Uses Gemini to explain why a species is recommended
Translates model output into human-understandable reasoning
Helps fishermen trust and interpret predictions
Trip Planner + Calendar Integration
Automatically selects the optimal fishing day within a given range
Generates a calendar event for the trip
Turns predictions into real, schedulable plans
How we built it
Data Sources: We used two datasets from NOAA’s ERDDAP CalCOFI collection:
- Larvae dataset
- Contains species-level larvae abundance across regions and time
Acts as a leading indicator of future fish populations
Catch dataset
Contains actual recorded fish catches
Grounds the model in real fishing outcomes
Dataset Alignment:
The two datasets were fundamentally different:
- Different schemas (column names, formats)
- Different spatial resolutions (raw coordinates vs catch locations)
- Different temporal formats (timestamps vs structured dates)
To make them usable together, we: Standardized time:
- Converted timestamps into a common monthly format Created spatial regions:
- Cleaned data to align regions from both data sets (West Coast)
- Binned latitude/longitude into grid cells
- Generated a shared region_id for both datasets Aggregated data:
- Computed average larvae density per region and time
- Computed average catch levels Cleaned:
- Identified endangered species
- Removed endangered species from data to avoid fatal recommendations
This alignment allowed us to combine biological signals with real catch outcomes in a consistent framework.
Machine Learning Pipeline:
ML pipeline using Databricks + PySpark + scikit-learn.
Applied a time-lag transformation (~4 weeks): larvae (past) → fish catch (future)
- Trained a Random Forest model to predict high-yield fishing conditions
- Generated structured predictions for each region, species, and time window
The output is stored in a file containing:
- location (lat/lon)
- species
- confidence scores
- yield levels
- seasonal signals
This acts as the core intelligence layer of the entire system.
The Random Forest model:
- handles nonlinear relationships between biological and spatial signals
- outputs probability-style confidence scores
- allows us to classify fishing opportunities into yield levels (e.g., high, medium, low)
Prediction Layer
Model outputs are written to the intelligence.json file containing structured predictions for each region, species, and time window.
Backend (FastAPI)
We built a FastAPI backend that serves predictions from intelligence.json.
Instead of retraining models at runtime:
the backend loads precomputed predictions applies filtering, ranking, and geospatial logic
Key responsibilities:
Trip Planner ranking ranks species based on confidence and yield for a given location + date Geospatial matching uses distance calculations to map user input to nearest grid regions Insights generation aggregates species predictions and determines best fishing windows Calendar generation creates .ics events for scheduling trips
AI Layer (Gemini)
We integrated Gemini 2.5 Flash to generate:
natural-language explanations for predictions concise reasoning behind species recommendations
Gemini enhances interpretability but does not affect model rankings—it strictly explains them.
Frontend (React + Mapbox)
We built a responsive frontend using React, TypeScript, Vite, and Tailwind CSS.
Key features:
Interactive Map (Mapbox) visualizes predicted fishing hotspots spatially Trip Planner takes user input (location + date range) returns ranked fishing recommendations Region Detail Panel displays species, confidence, and yield for selected areas Calendar Export allows users to schedule fishing trips directly
APIs
Our backend exposes endpoints that connect the ML system to the frontend:
/planner/recommendations → ranked species predictions /insights → aggregated species insights /region-details → data for selected map points /calendar → downloadable fishing trip events
Challenges we ran into
- Time modeling complexity
Initially, our model ignored time, leading to static predictions. We had to redesign the pipeline to incorporate biological lag, which required rethinking how we aligned datasets.
- Dataset alignment
The larvae and catch datasets had:
- different schemas
- different time formats
- different spatial structures
We solved this by:
- creating region grids
- standardizing timestamps
- aggregating both datasets to a shared structure
- API + deployment issues
We encountered:
- environment conflicts
- Git merge issues
We resolved these with:
- .env security handling
- proper Git workflows
- caching + fallback logic
Accomplishments that we're proud of
- Successfully integrated two complex oceanographic datasets
- Built a time-aware ML model grounded in biology
- Created a full-stack system (ML + backend + UI)
- Delivered actionable outputs (calendar scheduling), not just predictions
- Added AI explanations to improve usability and transparency
What we learned
- Biological systems require temporal thinking, not just spatial modeling
- Data alignment is often harder than modeling itself
- The most impactful systems are those that bridge prediction → action
- AI explanations significantly improve user trust and interpretability
What's next for SpawnCast
- Incorporate real-time ocean conditions (temperature, currents, weather)
- Incorporate more offline features (downloadable Google maps integration that shows routes to the recommended region)
- Improve time-series modeling (e.g., LSTMs or temporal models)
- Expand beyond California to global fisheries
- Personalization based on user preferences and fishing styles
- Build a full mobile experience for real-world usability
Log in or sign up for Devpost to join the conversation.