Inspiration

The U.S. Coast Guard (USCG) needs 3-day ice cover predictions and commercial supply support carries 130 million tons of cargo through Great Lakes. Current USNIC data has limitations. This project provides an ML solution to generate this 3-day forecast.

What it does

The project is a deep learning pipeline that generates a 3-day (T+1, T+2, T+3) ice concentration forecast. It ingests historical ice data, weather forecasts, and static environmental data to train a model. The final output is a netCDF file and a visualization.

How it was built

  • Model: A 2D U-Net built in PyTorch, using standard ConvBlock, DownBlock, and UpBlock layers with skip connections.
  • Data Input (9 Channels):
    1. T0 Ice Concentration (GLSEA)
    2. Delta Ice (T0 - T-1) (GLSEA)
    3. T-1 Air Temp (HRRR)
    4. T-1 U-Wind (HRRR)
    5. T-1 V-Wind (HRRR)
    6. T-1 Precipitation Rate (HRRR)
    7. T0 Water Surface Temp (GLSEA)
    8. Shipping Routes Mask (Static)
    9. GEBCO Bathymetry (Static)
  • Data Output (3 Channels):
    1. T+1 Ice Concentration
    2. T+2 Ice Concentration
    3. T+3 Ice Concentration
  • Training:
    • Loss: Masked MSE (masked_loss) ignores land pixels.
    • Optimization: Uses Adam, ReduceLROnPlateau scheduler, and torch.cuda.amp for mixed-precision.
    • Sampling: A GreatLakesDataset samples 256x256 patches, biased to prioritize areas with ice or shipping routes.

Challenges

  • Data Integration: Integrating heterogeneous data (S3 Zarr, netCDF, shapefiles, GeoTIFFs).
  • Geospatial Alignment: Reprojecting all data (e.g., HRRR, GLSEA) to a common master grid.
  • Training Stability:
    • Using masked_loss to ignore land.
    • Cleaning all inputs of NaNs before concatenation.
    • Correcting U-Net UpBlock channel arithmetic for skip connections.

Accomplishments

  • An end-to-end forecasting pipeline that ingests raw, multi-modal data and outputs a 3-day prediction.
  • An efficient GreatLakesDataset with pre-loading, caching, and biased sampling.
  • A robust training loop using mixed-precision and a custom masked loss function.
  • Successful integration of 9 distinct input channels (dynamic weather/ice and static bathymetry/shipping routes).

Lessons learned

  • By far the most important parts of the project ended up being the data alignment and prioritizing the right metrics (ice concentration recall being the most important).
  • Lower-level evaluations should be done on channels and ports (for instance an evaluation on the St. Mary's river).

What's next for GLIC?

  • Integrate ice thickness prediction.
  • Classify ice type (pack vs. fast ice).
  • Classify ice state (new, melting, thin, thick).
  • Improve forecast resolution and access frequency.
  • Spend more time coming up with a clean dataset to minimize resizing.

Built With

Share this project:

Updates