GeoHarvest

This project focuses on computing normalized vegetation, water, and soil metrics from multi-temporal satellite data to quantify and forecast farmland productivity.

Inspiration

We were inspired by the everyday resilience of farmers, people who wake up before sunrise, work through heat and rain, and still face so much uncertainty. Despite their relentless effort, small changes in weather or soil can decide their entire year’s outcome. It felt deeply unfair that those who feed us still have to rely on guesswork to understand their own land.

Our goal was to create something affordable yet powerful, a way to give every farmer a clear, data-backed view of their farmland without expensive equipment or ground sensors.
Manually inspecting each field using sensors is not only costly and time-consuming, but also impractical at scale. That’s where our inspiration for using satellite data came in. While satellites may not offer the pinpoint precision of in-field sensors, they can provide a strong baseline understanding that’s good enough to identify early signs of stress or productivity change across vast regions.

The best part we discovered is that satellites already capture an incredible amount of climate and environmental data. With modern AI-driven feature extraction techniques, these images can be transformed into meaningful metric evaluations that reflect soil health, vegetation vigor, and water availability.

At its core, this project is about empathy and empowerment, helping farmers make decisions with confidence, protecting the land that sustains them, and giving them technological power to thrive in a changing climate.

If we were to sum it up, the idea of turning invisible satellite data into simple, actionable insights that genuinely help farmers is what sparked the Farmland Productivity Mapper.

What it does

The Farmland Productivity Mapper is an AI-powered platform that analyzes satellite imagery from sources such as Sentinel-2, Google Earth Engine, Landsat, and Bhuvan to compute a dynamic Productivity Index for agricultural plots. It is designed to detect changes in crop vigor, soil moisture, and vegetation patterns over time. Basically, it translates spectral data over-time into human-understandable data.

The system:

Computes the following:
- Normalized Indexes:
  - Vegetation Health Index
  - Water Availability Index
  - Soil Adjustment Index
  - Enhanced Vegetation Index
  - Productivity Index (0–100) reflecting the overall relative farmland performance.
- Soil metrics to assess vegetation, health and water availability at farmlands.
Tracks temporary trends in data to detect degradation or improvement across growing seasons and years.
Correlates satellite indicators with weather, rainfall, and topography data to ensure environmental factors are taken into account.
Predicts future yield potential and provides early warnings on low productivity zones.
Provides alerts such as “Soil moisture deficit likely in next 10 days” or “Nitrogen deficiency indicated in southern section.”

How we will build it

System Architecture Overview

Data Collection:
- Pull multi-temporal satellite imagery from Google Earth Engine (GEE) using sources like Sentinel-2 (ESA), Landsat 8/9 (NASA), and ISRO Bhuvan to build a reliable and accurate dataset.
- Integrate meteorological data (rainfall, humidity, temperature) from IMD and NASA POWER.
- Collect topographic and soil data from the National Bureau of Soil Survey (NBSS) and SRTM elevation models to better model the land considerations.
- Use historical yield and crop-type records to train correlation models for normalization and index metrics.
Pre-processing:
- Perform cloud masking and geometric correction using GEE.
- Generate composites for each season and align them across years.
- Normalize data to consistent spectral resolution for reliable comparison (based on the theory of *good enough > perfect).
Spectral Feature Extraction & Change Detection:
- Compute different indices: Vegetation Health Index (VHI), Water Availability Index (WAI), Soil Adjustment Index (SAI), and Enhanced Vegetation Index (EVI).
- Normalize each index across historical time windows (e.g., 5–10 years) to remove seasonal noise and regional bias.
- Use pixel-wise temporal differencing to detect vegetation shifts, soil dryness, and greenness loss.
- Extract features such as:
  - Average VHI, WAI, SAI and EVI deviation over time
  - Moisture retention gradient (this is not a key feature and is dependent on data availability)
  - Crop rotation impact coefficient
- Employ a Temporal CNN-LSTM pipeline to analyze pattern over years and detect degradation or improvement trends to develop a reliable historical baseline.

The end goal will be to aggregate these metrics into a composite Productivity Index (0–100) using a weighted average approach, tuned via regression on known yield or vegetation density data.

Productivity Classification & Prediction:
- Train machine learning models (Random Forest / XGBoost / LSTM) on combined satellite and environmental features.
- Compute an overall Productivity Index (0–100) based upon the indices described in (3. Spectral Feature Extraction & Change Detection). The weightage of the extracted indexes is subject to experimentation for the best possible accuracy. This index reflects the relative performance of farmland compared to a hypothetical perfect farmland. However, this index will be normalized over historical averages to obtain a real index rather than a hypothetical one based on the theory of good enough > perfect.
- Predict future productivity using trend extrapolation and climate-forecast data. This will be done using LSTM regression as it would enable forecasting for up to 2–3 growing seasons ahead.
- Generate regional-level alerts for advises to increase the Productivity Index such as irrigation requirement, soil health risk, etc.
Automated Insight Generation:
- The system integrates Large Language Models (LLMs) to generate human-readable reports summarizing technical metrics, anomalies, and recommendations.
- Reports provide contextual insights such as “Productivity index dropped by 12% due to rainfall shortage in the last 40 days; This farmland has a low productivity index ~ 20%.”
- This enhances accessibility for non-technical users such as farmers, government officers, and agribusiness managers.

Challenges we may run into

Data inconsistency:
Cloud cover, sensor resolution mismatch, and missing temporal data can cause gaps or distortions in analysis. Additionally, using datasets from other countries for temperature, soil, or crop patterns may yield inaccurate results for India due to vastly different climatic conditions, soil compositions, and agricultural practices.
Label scarcity:
There is a lack of consistent, high-quality ground-truth data on soil properties, crop diseases, and pest infestations — all of which directly affect productivity assessments. The absence of these labels can limit the accuracy of machine learning models in predicting true on-ground health conditions.
Local variability:
Crop cycles, irrigation methods, and soil fertility vary dramatically across India’s diverse agro-climatic zones. This variability makes it difficult to generalize models or train them effectively using a single national dataset, leading to potential inaccuracies in productivity mapping for certain regions.
Scalability and infrastructure constraints:
Processing multi-temporal, multispectral data for thousands of farms across India demands significant cloud storage, GPU resources, and parallel computation capabilities, which may increase cost and complexity for large-scale deployment.

How do we aim to overcome the challenges?

Data inconsistency mitigation:
We will use multi-source data fusion by combining imagery from Sentinel-2, Landsat, and Google Earth Engine to maintain continuity despite cloud cover or data gaps.
Addressing label scarcity:
We will use unsupervised anomaly detection models to identify productivity shifts and vegetation anomalies even without labeled ground data. Over time, limited validated data from agricultural departments can refine and improve these models.
Managing local variability:
Our models will use region-specific calibration, fine-tuned separately for each agro-climatic zone in India based on local crop cycles, soil types, and rainfall data. This ensures accuracy and adaptability across diverse farming regions.
Scaling computation efficiently:
We will use Google Earth Engine’s distributed processing and lightweight ML architectures to handle multi-temporal data efficiently. Parallel computation and cloud batching will make large-scale analysis feasible.

Major Project Milestones

Area	Key Aspects
1. Dataset Assembly	Collect and combine multi-year satellite and climate data from different sources for 3 pilot districts. Fetch geospatial data using public datasets and APIs, and merge it with satellite imagery.
2. Index Computation	Implement normalized index over historical averages.
3. Change Detection & Feature Extraction	Build a temporary CNN-LSTM architecture for pattern detection.
4. Productivity Index Model	Train and validate productivity index model that is to be composed over weighted indexes and normalized over historical averages.
5. Insight Generation Module	Train LLMs on the custom built dataset and integrate them for automated report generation.
6. Prototype Dashboard & Testing	Develop dashboard for visualizing the 3 pilot district farmland clusters and final presentation.

What we will learn

Implementing the principle of “good enough > perfect.”
Application of remote sensing and AI in agriculture and sustainability.
Applying feature engineering across multi-source datasets, combining spectral metrics, rainfall, temperature, and topographic data for improved model reliability.
Learning to integrate geospatial data with temporal modeling (using CNN-LSTM architectures) to detect long-term productivity trends and soil health shifts.
Designing an index normalization pipeline that compares real farmland conditions against historical averages instead of idealized baselines.
Working with multispectral imagery and time-series geospatial data.
Building explainable ML models for non-technical stakeholders.
Creating interpretable AI outputs (Explainable AI) using LLMs that translate technical metrics into an easy-to-understand report summary for farmers and the government.
Understanding real-world challenges in climate-smart agriculture including metric fusion and model calibration across agro-climatic zones affect accuracy and sustainability.

What's next for GeoHarvest

After Trackshift, we plan to:

Eventually evolve into a decision intelligence engine for land sustainability, allowing the government to better focus on specific farmlands and assist farmers. This will also assist in resource allocation and intervention strategies.
Expand coverage to entire agro-climatic zones of India using scalable Google Earth Engine APIs.
Partner with state agricultural departments and cooperatives for pilot deployment.