Inspiration

Most families in my village in Kenya depend completely on small scale farming.
Every season I hear the same problems: yellow leaves, stunted maize, unknown pests, no idea whether to spray or not.
Getting expert advice is very hard sometimes you wait weeks for an extension officer and by then the crop is already badly damaged.

I wanted to give farmers something that can help right now, in the middle of the field, just by speaking in Swahili or English, without needing to type.

That is why I built Mkulima AI.

What it does

Mkulima AI is a voice first crop advisor made for Kenyan smallholder farmers.

  • You speak naturally to the phone (Swahili or English)
  • You can show a photo of the affected plant/leaf/fruit
  • The AI immediately analyzes the photo + your description
  • It gives you spoken step-by-step advice
  • It can check current weather and use that information
  • It remembers previous conversations about the same farm

Hands free. Works on basic smartphones. No typing needed.

How we built it

Core technology Google Gemini 3

Main pieces:

  • Core AI: Google Gemini 3 (Gemini Live API for real-time voice + multimodal vision)
  • Voice: Gemini Live API for bidirectional, interruptible voice streaming (speak while it listens, barge in, natural flow)
  • Image analysis: Send photos as multimodal input to Gemini 3
  • Tools: Built function calling for weather data
  • Memory: Long context to keep farm history across sessions
  • Frontend: React 19 + TypeScript + Tailwind CSS simple, mobile-first UI with large buttons and clear icons
  • Audio: Web Audio API + MediaRecorder for microphone input and playback
  • Deployment: Built to run in browser

Everything runs in the browser. Built to be as lightweight as possible.

Challenges we ran into

  • Very unstable WebSocket connections on slow rural mobile networks , had to build reconnection + fallback to text
  • Gemini sometimes too confident about plant diseases from bad photos/lighting, had to force very careful, humble prompting
  • Agricultural terms in Swahili were sometimes confused , had to give many clear examples in the system prompt
  • Making voice feel really natural (not robotic) took a lot of small latency & interruption tunings
  • Very tight time had to ruthlessly cut features to have a strong, clean demo

Accomplishments that we're proud of

  • First time I managed to make really natural feeling voice-to-voice with Gemini Live API
  • The moment when the AI actually interrupts you and continues correctly, feels magical
  • Getting quite accurate plant problem detection + reasonable local recommendations
  • Making something that really could be useful in villages where internet is slow and people mostly speak Swahili
  • Managing to finish a complete working demo + good video in very short time

What we learned

  • How extremely powerful (and sometimes dangerously over-confident) multimodal frontier models are
  • Importance of very strict, repetitive, culturally appropriate system prompting
  • How much low latency + proper interruption handling changes the whole user feeling
  • That voice interface is dramatically more usable than text for rural / low-literacy users
  • How fast you can prototype serious agentic applications when you have good multimodal + voice + tool calling in one model

What's next for MkulimaAI

Short term plans:

  • Better handling of very poor quality / very dark photos
  • Add voice speed & accent fine-tuning
  • Simple offline caching of last advice + basic rules
  • More accurate Swahili agricultural vocabulary & local remedy database
  • Very simple way for farmers to share useful photos/diagnoses with each other (community learning)

Longer term plan:

  • Partner with local cooperatives / extension services
  • Add market prices + best selling time suggestions
  • Voice reminders for planting / spraying / weeding
  • Possible integration with SMS fallback for zero internet areas

I really want this to become a practical tool many farmers in Kenya actually use.

Share this project:

Updates