Inspiration
Northern anchovies (Engraulis mordax) are the cornerstone of the California Current ecosystem. Entire populations of brown pelicans, sea lions, humpback whales, and Chinook salmon rely on their abundance. Yet, between 2009 and 2011, the anchovy population crashed by 99% (plummeting from ~1 million tons to just 15,000 tons) despite the near-absence of commercial fishing pressure.
This collapse triggered a devastating ecological cascade: brown pelicans abandoned their nests, and over 70% of sea lion pups washed up starving on California beaches before weaning. In 2022, a federal court ruled that NOAA violated fishery law by managing these populations using decades-old data. Today, anchovy conservation consists entirely of total-catch limits, with zero spatial protections for this keystone forage species. We realized that setting arbitrary quotas doesn't matter if the anchovies move geographically out of reach of fixed-colony predators. We needed a new methodology to map out anchovy conservation: not just how many fish can be caught, but where they need to be protected.
What it does
Our project is a Spatial Conservation Framework for Northern Anchovy. It is a data-driven web application that allows policy researchers and marine biologists to visualize the overlap between anchovy habitat and the existing network of marine closures originally designed for entirely different species (like groundfish).
- Historical Habitat Mapping: We generate spatial grids of habitat suitability probabilities \(P(Habitat | T, S, D)\) across the California Current.
- Incidental Coverage Analytics: The system calculates an "incidental coverage" metric, the exact percentage of highly suitable anchovy habitat that falls inside existing Groundfish Exclusion Areas (GEAs) in any given year.
- Scenario Generator: We decoupled our model from forecasting exact climate variables, instead creating a sandbox that applies thermal distribution shifts (e.g., \(+0.5^\circ\text{C}, +1.0^\circ\text{C}, +2.0^\circ\text{C}\) to measure how climate change causes habitat to migrate out of protected borders.
How we built it
The framework was built using an end-to-end Python stack emphasizing high-performance data processing:
- Data Pipeline & Integration: We ingested 27 years of larval anchovy densities from the physical CalCOFI bottle database, cross-matching them with contemporaneous coastal hydrographic CTD parameters (Temperature, Salinity, Depth).
- Machine Learning Model: We engineered a spatial habitat probability model using an XGBoost classifier. To prevent spatial autocorrelation and data leakage, we utilized rigorous spatial cross-validation architectures, holding out entire geographical grid blocks instead of employing naive randomized train-test splits.
- Geospatial Processing: Utilizing
geopandasandshapely, we algorithmically computed intersection vectors between dynamically generated 0.5° resolution habitat grids and static federal GeoJSON bounding polygons. - Interactive Dashboard Front-End: The user experience is powered by Streamlit, featuring optimized
Foliumintegrations for responsive map rendering alongside reactivePlotlyline charts charting the coverage metrics through time (1996–2022). - Real-time API Integration: We overlaid the spatial habitat mappings with active human industrial behavior by directly fetching purse-seiner fishing hours via the Global Fishing Watch 4wings API.
Challenges we ran into
- Handling Spatial Extrapolation: Validating the machine learning model across changing environmental regimes without extrapolation bias was difficult. Standard evaluation metrics failed until we committed to block-level spatial holdouts to truthfully measure standard errors.
- Dashboard Optimization: Translating tens of thousands of dynamically scored probabilities against intersection geometries was computationally expensive. We had to employ memoization via Streamlit in order to render interactive multi-layer
Foliumheatmaps smoothly. - Unreliable Third-Party Network Endpoints: Connecting to external APIs (like Global Fishing Watch) yielded issues with web requests and timeouts, requiring the use of caching layers to maintain continuous dashboard stability.
Accomplishments that we're proud of
- Bringing the concept of "incidental coverage" to life mathematically: mapping out exactly how poor the current static geographical protections are for migrating forage fish.
- Validating through the use of two separate datasets the importance of accurate closure boundaries; even GEAs not explicitly designed for anchovies had the effect of reducing anchovy fishing effort (although not necessarily in the optimal regions).
What we learned
Early on, we realized the importance of being realistic about the capacity of the ML models we were using. Attempting to predict future anchovy habitats based on climate data would be a much more difficult task requiring more sophisticated models and comprehensive data. Framing the goal as modeling and analyzing hypothetical temperature scenarios helped us keep our aspirations within reason.
We also discovered the power of integrating multiple datasets, allowing us to make more powerful claims and provide stronger confidence in the validity of our concept. The use of data from CalCOFI, NOAA, and external sources provided us with a more robust application.
What's next
This project is a complete proof-of-concept, but there are many ways to increase its scientific rigor and applicability.
- Ecosystem Cascade Visualizations: We plan to explicitly map out known Channel Island brown pelican nesting colonies and sea lion rookeries within the application, creating a stark visual of the geographical "mismatch" that triggers starvation incidents when habitat shifts.
- Biogeochemical Coupling: Transitioning our XGBoost model to include inputs for dissolved oxygen levels, surface chlorophyll availability, and ocean acidification pH factors.
- Random-Placement Baselines: Engineering alternative Monte-Carlo generated control networks to mathematically prove whether current closure areas perform better than entirely randomly placed rectangles.
- Institutional Advocacy: Taking this functional code and architectural methodology directly to oceanographic institutions like Scripps to pitch real-world integrations utilizing large-scale ROMS+NPZ climate models.
Built With
- folium
- geopandas
- numpy
- pandas
- plotly
- python
- scikit-learn
- shapely
- streamlit
- xgboost
Log in or sign up for Devpost to join the conversation.