The proposed approach presents a multi-modal encoder-decoder architecture for 6-class crop segmentation across fragmented Indian smallholder farms using Prithvi-EO-2.0 foundation models with TerraTorch framework. This methodology will implement early SAR-optical fusion combining temporal Sentinel-2 multispectral observations with Sentinel-1 VV/VH polarizations through channel concatenation. The architecture will employ Prithvi-EO-2.0-300M as backbone encoder with pyramidal feature extraction via SelectIndices and LearnedInterpolateToPyramidal necks, decoded through UNetDecoder with skip connections. Training optimization will use Lovász-Softmax loss for direct IoU maximization combined with cross-entropy for stability. Mixed precision BF16 training with progressive backbone unfreezing will reduce computational overhead. Geographic district-level validation splits across UP, Rajasthan, Odisha, and Bihar will ensure generalization. The proposed method targets Micro IoU >0.75 on AgriFieldNet India dataset with balanced per-class performance on Gram, Maize, Mustard, Sugarcane, Wheat, and Other categories.

Built With

  • bash-**deep-learning:**-pytorch-2.0+
  • gdal/ogr
  • git
  • machine-learning
  • numpy
  • nvidia-apex-**development:**-jupyter-notebooks
  • opencv
  • pandas-**computer-vision:**-albumentations
  • python
  • pytorch
  • pytorch-lightning
  • qgis
  • scikit-image-**training:**-torchmetrics
  • scikit-learn
  • terratorch
  • transformers-(huggingface)-**geospatial:**-rasterio
  • weights-&-biases
  • yaml
Share this project:

Updates