The proposed approach presents a multi-modal encoder-decoder architecture for 6-class crop segmentation across fragmented Indian smallholder farms using Prithvi-EO-2.0 foundation models with TerraTorch framework. This methodology will implement early SAR-optical fusion combining temporal Sentinel-2 multispectral observations with Sentinel-1 VV/VH polarizations through channel concatenation. The architecture will employ Prithvi-EO-2.0-300M as backbone encoder with pyramidal feature extraction via SelectIndices and LearnedInterpolateToPyramidal necks, decoded through UNetDecoder with skip connections. Training optimization will use Lovász-Softmax loss for direct IoU maximization combined with cross-entropy for stability. Mixed precision BF16 training with progressive backbone unfreezing will reduce computational overhead. Geographic district-level validation splits across UP, Rajasthan, Odisha, and Bihar will ensure generalization. The proposed method targets Micro IoU >0.75 on AgriFieldNet India dataset with balanced per-class performance on Gram, Maize, Mustard, Sugarcane, Wheat, and Other categories.
Built With
- bash-**deep-learning:**-pytorch-2.0+
- gdal/ogr
- git
- machine-learning
- numpy
- nvidia-apex-**development:**-jupyter-notebooks
- opencv
- pandas-**computer-vision:**-albumentations
- python
- pytorch
- pytorch-lightning
- qgis
- scikit-image-**training:**-torchmetrics
- scikit-learn
- terratorch
- transformers-(huggingface)-**geospatial:**-rasterio
- weights-&-biases
- yaml
Log in or sign up for Devpost to join the conversation.