AI Krishi Sahayak

Inspiration

Maharashtra has 36 districts, each with unique agro-climatic conditions, rainfall variability, and cropping systems. Every year, farmers face delayed monsoons, mid-season droughts, pest outbreaks, and unstable market prices. Although district-wise Agriculture Contingency Plans exist (prepared by ICAR/CRIDA and state agriculture departments), they are lengthy PDF documents that rarely reach farmers in an accessible form.

We were inspired by a simple question:

What if a farmer could simply make a phone call and instantly receive district-specific, official, climate-resilient farming advice in their own language?

Most small and marginal farmers:

  • Prefer speaking over typing,
  • Use WhatsApp but avoid complex applications,
  • Need localized and timely guidance,
  • Struggle to interpret long technical documents.

AI Krishi Sahayak was built to bridge this gap by transforming static government documents into dynamic, voice-accessible agricultural intelligence.


What it does

AI Krishi Sahayak is a multilingual, voice-first AI advisor that delivers:

  • District-specific contingency guidance (e.g., what to do in case of a drought or flood),
  • Real-time weather context (7-day forecast),
  • Current mandi prices for key crops,
  • Relevant government subsidy information,
  • Concise, spoken responses in Marathi, Hindi, or English.

Farmers can interact through WhatsApp (text or voice notes) or by calling an IVR number. The system transcribes speech, retrieves relevant district-specific strategies, augments responses with live data, and provides grounded advice within seconds.


How we built it

We designed the system using a Retrieval-Augmented Generation (RAG) architecture to ensure all advice is grounded in official documents.

For a user query ( \text{query} ), the process is as follows:

  1. Retrieval: We convert the query into an embedding and retrieve the top-( k ) most relevant sections from the district contingency plan using a vector search.

  2. Augmentation: These retrieved sections are combined with live data sources to create a rich, structured prompt:

    • 7-day weather forecast data (from a public API),
    • Current mandi price information (from a state APMC portal),
    • Subsidy eligibility rules (from government databases).
  3. Generation: The final structured prompt is sent to a large language model (Llama/Gemma), which generates a concise and grounded response, citing the source information.

The system stack includes:

Component Technology Used
Backend API Python + FastAPI
Vector Search FAISS
Embeddings Sentence Transformers
Speech-to-Text Whisper (OpenAI)
Messaging/Voice Twilio (WhatsApp & IVR)

Challenges we ran into

  1. Data Extraction and Chunking: Contingency plans were unstructured PDFs with tables, headings, and dense paragraphs. Extracting drought, pest, and crop-specific sections required custom parsing and intelligent semantic chunking to ensure retrievable context.

  2. Preventing Hallucinations: To maintain farmer trust, it was critical that the LLM only used the retrieved official content. We enforced this through prompt engineering, instructing the model to refrain from answering if the information wasn't in the provided context.

  3. Multilingual Queries (Hinglish): Farmers often mix Marathi, Hindi, and English in a single sentence. We implemented language detection and localized prompting to ensure the model could understand and respond appropriately in the user's preferred language.

  4. Latency Constraints: Voice-based systems must respond quickly (ideally under 5 seconds). We optimized our retrieval filtering and chose a more efficient LLM for inference to meet this requirement and keep the conversation natural.


Accomplishments that we're proud of

  • Successfully built a working multilingual RAG pipeline grounded in official district documents.
  • Integrated real-time weather and mandi price data into AI responses, moving beyond static advice.
  • Enabled true accessibility through voice-based interaction via IVR and WhatsApp voice notes.
  • Designed a scalable architecture capable of covering all 36 districts of Maharashtra without a linear increase in complexity.

What we learned

  • Effective AI systems rely more on intelligent architecture and high-quality data than on massive, general-purpose models.
  • Retrieval quality (chunking strategy, embedding model choice) directly impacts the reliability and accuracy of the final response.
  • For our target users, accessibility (voice + WhatsApp) is far more important than a feature-rich mobile application or a complex user interface.
  • Grounding AI in authoritative local data is the most effective way to significantly reduce the risk of misinformation in a domain as critical as agriculture.

What's next for AI Krishi Sahayak

  • [x] Pilot in 5 districts
  • [ ] Expand coverage to all 36 districts of Maharashtra, ingesting all their respective contingency plans.
  • [ ] Add image-based diagnostics – integrate computer vision to allow farmers to upload photos of diseased crops for instant detection and remedy suggestions.
  • [ ] Implement proactive alerts – use the weather API and pest models to send outbound proactive alerts for predicted events like a delayed monsoon or pest outbreaks.
  • [ ] Integrate yield prediction – combine historical rainfall data, real-time soil parameters, and crop stage to offer farmers a yield prediction model.
  • [ ] Partner with local KVKs (Krishi Vigyan Kendras) and agricultural officers for rigorous field validation and to ensure our advice remains practical and actionable.

Technical Footnotes

[^1]: The embedding model used is paraphrase-multilingual-MiniLM-L12-v2, which supports 50+ languages including Marathi and Hindi.

[^2]: FAISS index is built on chunks of ~256 tokens with 20% overlap to preserve context.

[^3]: Live weather data is fetched from "OpenWeatherMap" API; mandi prices from AgMarket portal.


Built with ❤️ for the farmers of Maharashtra.

Built With

Share this project:

Updates