Farm Fusion

Intelligent Agricultural Recommendation System with Weather Monitoring

A platform that combines machine learning, real-time weather monitoring, and automated notifications to help farmers make data-driven decisions about crop selection and fertilizer usage.

I have shared my entire thought process here, why did I choose one thing over another. So if you are reading this then just bear with me, you will find it interesting. Some parts of implementation are under development but entire thought process is here.

Why This Project Matters

The Problem

Farmers face critical decisions daily:

What crop should I plant? (Wrong choice = entire season lost)
What fertilizer do I need? (Wrong amount = money wasted or crops damaged)
Will weather harm my crops? (Late warning = no time to protect)

The Solution

An intelligent system that:

Recommends optimal crops based on soil conditions (99.32% accuracy)
Suggests precise fertilizer types based on soil and crop data
Monitors weather 24/7 and sends automated alerts before dangerous conditions
Scales to handle thousands of farms with minimal latency

Real-World Impact

Time Saved: Automated daily weather checks for all farms
Cost Reduction: Precise fertilizer recommendations prevent waste
Risk Mitigation: Early weather warnings protect crops

- Data-Driven: ML models trained on 2,200+ agricultural data points.

System Architecture

High-Level Overview

Homepage

Architecture Principles

Clean Architecture

Domain entities are independent of frameworks
Business logic isolated from infrastructure
Dependencies point inward (Dependency Inversion)
Go API: Authentication, business logic, orchestration
Python ML: Model inference (scikit-learn)

Event-Driven Architecture

RabbitMQ decouples notification generation from email sending
Async processing prevents API blocking
Retry logic for failed emails

Domain Deep Dive

Start with the recommendation and notification part first. Later other parts will be discussed.

ML Recommendation Domain

Problem: Provide accurate crop and fertilizer recommendations using ML models

Architecture:

┌─────────────────────────────────────────────────────────┐
│              ML RECOMMENDATION SYSTEM                    │
└─────────────────────────────────────────────────────────┘

Go Backend                    Python ML Service
    │                              │
    │  POST /predict/crop          │
    ├─────────────────────────────▶│
    │  {N, P, K, temp, humidity,   │
    │   ph, rainfall}               │
    │                              │
    │                         ┌────┴────┐
    │                         │ Load    │
    │                         │ Model   │
    │                         └────┬────┘
    │                              │
    │                              ▼
    │                         ┌──────────────┐
    │                         │ Run Various  │
    │                         │model & choose│
    │                         └──────┬───────┘
    │                              │
    │                              │
    │                         ┌────▼────┐
    │                         │ Get Top │
    │                         │ 3 Probs │
    │                         └────┬────┘
    │                              │
    │  {crop: "rice",              │
    │   confidence: 0.99,          │
    │◀─────────────────────────────┤
    │   alternatives: [...]}       │
    │                              │


Model Training (Offline):

┌──────────────┐    ┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│ CSV Dataset  │───▶│ Preprocess   │───▶│ Train RF     │───▶│ Save .pkl   │
│ 2200 samples │    │ • Normalize  │    │ • 100 trees  │    │ • Model      │
└──────────────┘    │ • Encode     │    │ • Max depth  │    │ • Encoders   │
                    └──────────────┘    └──────────────┘    └──────────────┘

What We Did:

Trained various models and choose best models on agricultural datasets
Created separate Python FastAPI service for ML inference
Implemented HTTP client in Go to call ML service
Added confidence thresholds and warnings

Why This Way:

Why Python: Python is best for ML, Go is best for web APIs
Why Classifier (Not Regressor)?

Here our Task:

Input: Soil nutrients (N, P, K), weather (temp, humidity, rainfall), pH. Output: Crop name (rice, wheat, maize, etc.) - Discrete categories

Classification works on Discrete categories like "rice", "wheat", "maize" , "Crop/Fertilizer names". Regression works on Continuous numbers like 45.7, 123.4 not predicting quantities. We're not predicting "how much" (quantity), We're predicting "which one" (category).

Why Random Forest (Not Decision Tree)?

Decision Tree Problems

Single Decision Tree:
                    [N > 50?]
                   /         \
              [Yes]           [No]
             /                    \
      [P > 30?]              [Humidity > 80?]
      /      \                /            \
   Rice    Wheat          Maize          Jute

Problems:
 Overfitting - Memorizes training data
 High variance - Small data change = completely different tree
 Unstable - Sensitive to noise
 Lower accuracy - Single perspective

Random Forest Solution

Random Forest = Ensemble of Many Trees:

Tree 1: Focuses on N, P, K
Tree 2: Focuses on Temperature, Humidity
Tree 3: Focuses on pH, Rainfall
...
Tree 100: Different feature combinations

Final Prediction = Majority Vote

Tree 1: Rice (90%)
Tree 2: Rice (85%)
Tree 3: Wheat (60%)
Tree 4: Rice (95%)
...
Tree 100: Rice (88%)

Result: Rice (87 trees voted Rice)

We tested other models and got best result for Random Forest.

Model Performance:

Crop Recommendation: 99.32% accuracy (2200 samples)
Fertilizer Recommendation: ~95% accuracy (variable by soil type)
Inference Time: <50ms per prediction
Model Size: ~2MB total

4. Weather Notification Domain

Problem: Automatically alert farmers about dangerous weather conditions

Architecture:

┌─────────────────────────────────────────────────────────────────┐
│                    WEATHER NOTIFICATION SYSTEM                  │
└─────────────────────────────────────────────────────────────────┘

┌────────────────────┐   ┌─────────────────────────────────────┐
│   CRON (5 AM)      │   │        DATABASE QUERIES             │
│   ───────────      │   │   ───────────────────────────       │
│ • Start daily      │   │   • Fetch all farms                 │
│   scheduler        │──▶│   • Get user emails per farm        │
└────────────────────┘   │   • Get alert thresholds            │
                         └─────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────┐
│                      FARM PROCESSING LOOP                       │
├─────────────────────────────────────────────────────────────────┤
│ 1. Check Redis Cache ──────┐                                    │
│    • HIT: Use cached       │   ┌─────────────────────────────┐  │
│    • MISS: Call API        │◀──│ OPENWEATHER API CALL        │  │
│                            │   │ ───────────────────────     │  │
│ 2. Detect Alerts:          │   │ • Get 24-hour forecast      │  │
│    • Temp < 15°C           │   │ • Cache result (3hr TTL)    │  │
│    • Temp > 35°C           │   └─────────────────────────────┘  │
│    • Rainfall > 50mm       │                                    │
│    • Humidity > 80%        │                                    │
│    • Wind > 40 km/h        │                                    │
│                            │                                    │
│ 3. Generate Summary        │                                    │
└────────────────────────────┼────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                   RABBITMQ PUBLISHING                           │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │ {                                                        │   │
│  │   "farm_id": "123",                                      │   │
│  │   "user_email": "user@example.com",                      │   │
│  │   "alerts": ["Temp > 35°C"],                             │   │
│  │   "summary": "Sunny, high of 38°C"                       │   │
│  │ }                                                        │   │
│  └──────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                    WORKER CONSUMER                              │
├─────────────────────────────────────────────────────────────────┤
│ 1. Receive message from queue                                   │
│                                                                 │
│ 2. Generate HTML email:                                         │
│    • Alert section (if any alerts)                              │
│    • Forecast summary                                           │
│                                                                 │
│ 3. Send via SMTP                                                │
│                                                                 │
│ 4. Log to notification_log table:                               │
│    • Success: email_sent = true                                 │
│    • Failure: error_message                                     │
│                                                                 │
│ 5. Acknowledge message                                          │
└─────────────────────────────────────────────────────────────────┘


CACHING STRATEGY:

Redis Key: weather:forecast:{lat}_{lon}
TTL: 3 hours
Why: OpenWeather updates every 3 hours, saves API calls

Example:
Farm A (23.81, 90.41) ─┐
Farm B (23.82, 90.42) ─┼─▶ Same location key ─▶ 1 API call
Farm C (23.80, 90.40) ─┘

What We Did:

Implemented cron-based scheduler (runs at 5 AM daily)
Used RabbitMQ for async email processing
Cached weather data in Redis (1-hour TTL)
Created alert detection logic with configurable thresholds
Built HTML email templates with alert details

Why Scheduled Notifications at 5 AM (Not Instant Messages)?

What We Did: Send weather notifications once daily at 5:00 AM

Alternative: Send instant notifications whenever weather changes

Why 5 AM Specifically?

User Behavior Analysis:

Farmer's Daily Schedule:
├─ 5:00 AM - Wake up, check phone
├─ 5:30 AM - Plan day based on weather
├─ 6:00 AM - Start farm work
├─ 12:00 PM - Lunch break
└─ 6:00 PM - End work, too late to react

Reasoning:

Early enough to plan: Farmers can adjust their day before starting work
Not too early: 5 AM is when most farmers wake up
Actionable window: 1-2 hours to prepare equipment, protect crops
Predictable: Users expect notification at same time daily

Why Not Other Times?

Midnight: Users asleep, notification ignored
8 AM: Too late, already started work
Evening (6 PM): Can't act on tomorrow's weather today

Why NOT Instant/Real-Time Notifications?

Technical Challenges

1. API Rate Limits

OpenWeather API Free Tier: 1,000 calls/day

Instant Approach:
- 100 farms × 24 checks/hour = 2,400 calls/hour
- 2,400 × 24 hours = 57,600 calls/day
- Cost: $50-100/month for API calls

Scheduled Approach (5 AM):
- 100 farms × 1 check/day = 100 calls/day
- With caching: ~8 calls/day (nearby farms share cache)
- Cost: FREE (under 1,000 limit)

Savings: 99.8% reduction in API calls

2. Weather Data Doesn't Change That Fast

OpenWeather Update Frequency: Every 3 hours

Checking every minute:
├─ 5:00 AM - Forecast: Rain at 2 PM
├─ 5:01 AM - Forecast: Rain at 2 PM (same)
├─ 5:02 AM - Forecast: Rain at 2 PM (same)
└─ ... (177 identical checks)
└─ 8:00 AM - Forecast: Rain at 2 PM (finally updated)

Result: 177 wasted API calls for same data

3. Email Fatigue

Instant Notifications:
├─ 5:00 AM - "Temperature dropping to 14°C at 2 PM"
├─ 6:00 AM - "Temperature dropping to 13°C at 2 PM" (updated forecast)
├─ 7:00 AM - "Temperature dropping to 14°C at 2 PM" (forecast changed back)
└─ 8:00 AM - "Temperature dropping to 13°C at 2 PM"

User Experience: 4 emails in 3 hours, all saying similar things
Result: User unsubscribes or ignores emails

4.Costs: This way cost is much higher due to over API calls.

5. Database Load

Instant Approach:
- Continuous polling: SELECT * FROM farms every minute
- 100 farms × 60 checks/hour = 6,000 queries/hour
- Database always busy

Scheduled Approach:
- One batch query: SELECT * FROM farms once/day
- 100 farms × 1 check/day = 100 queries/day
- Database mostly idle

Load Reduction: 99.3% fewer queries

User Experience Challenges

1. Notification Overload

Problem: Weather forecasts change frequently

Example Day:
├─ 6:00 AM - "Rain expected at 3 PM"
├─ 9:00 AM - "Rain moved to 4 PM"
├─ 12:00 PM - "Rain now at 2 PM"
├─ 3:00 PM - "Rain cancelled"
└─ 6:00 PM - "Rain back on at 8 PM"

Result: 5 notifications, user confused and annoyed

2. Actionability

Instant notification at 1 PM: "Heavy rain in 30 minutes"

Farmer's situation:
- Already in the field
- Equipment not nearby
- Can't protect crops in 30 minutes
- Notification causes stress, not help

Better: Morning notification
- "Heavy rain expected at 1:30 PM"
- Farmer can plan: finish work by 1 PM, bring equipment
- Actionable and helpful

3. Sleep Disruption

Instant Approach:
- Weather changes at 2 AM
- Notification wakes farmer
- Can't do anything until morning anyway
- Lost sleep for no benefit

Scheduled Approach:
- All changes summarized in 5 AM email
- Farmer wakes naturally
- Gets complete picture
- Can act immediately

Business Logic Challenges

1. Alert Grouping

Instant Approach:
├─ Alert 1: "Temperature < 15°C at 10 AM"
├─ Alert 2: "Temperature < 15°C at 11 AM"
├─ Alert 3: "Temperature < 15°C at 12 PM"
└─ Alert 4: "Temperature < 15°C at 1 PM"

Problem: 4 separate notifications for same condition

Scheduled Approach:
└─ One alert: "Temperature < 15°C from 10 AM - 1 PM (4 hours)"

Result: Clear, concise, actionable

Our Approach:

Daily scheduled for routine planning
Future: Add emergency alerts for severe weather
Best of both worlds

2. Why Rounding Latitude/Longitude (Location Key)?

The Design Decision

What We Did: Round coordinates and create location keys for caching

// Example
Farm A: lat=23.8103, lon=90.4125 → location_key="23.81_90.41"
Farm B: lat=23.8156, lon=90.4189 → location_key="23.82_90.42"
Farm C: lat=23.8099, lon=90.4134 → location_key="23.81_90.41"

Result: Farm A and C share same weather cache

Alternative: Use exact coordinates for each farm

Why Round Coordinates?

Technical Challenges

1. API Cost Explosion

Without Rounding (Exact Coordinates):

100 farms with unique coordinates:
├─ Farm 1: 23.810345, 90.412567
├─ Farm 2: 23.810389, 90.412601
├─ Farm 3: 23.810412, 90.412634
└─ ... (all slightly different)

API Calls: 100 unique calls/day
Cost: Hits rate limits quickly

With Rounding (2 decimal places):

100 farms grouped by area:
├─ Location 23.81_90.41: 25 farms
├─ Location 23.82_90.42: 30 farms
├─ Location 23.83_90.43: 20 farms
└─ Location 23.84_90.44: 25 farms

API Calls: 4 unique calls/day
Cost: 96% reduction

2. Weather Doesn't Vary That Much Locally

Weather Forecast Resolution:

OpenWeather API Grid: ~10-15 km squares
├─ 23.81, 90.41 → Grid Cell A
├─ 23.8103, 90.4125 → Grid Cell A (same!)
└─ 23.8156, 90.4189 → Grid Cell A (same!)

Reality: API returns identical data for nearby coordinates

Our Rounding: ~1.1 km precision
├─ 0.01° latitude ≈ 1.11 km
└─ 0.01° longitude ≈ 1.11 km (at equator)

Result: Farms within 1 km share forecast (accurate enough)

3. Cache Efficiency

Without Rounding:

Redis Cache:
├─ weather:23.810345_90.412567 → Forecast A
├─ weather:23.810389_90.412601 → Forecast B (99% same as A)
├─ weather:23.810412_90.412634 → Forecast C (99% same as A)
└─ ... (100 nearly identical entries)

Cache Hit Rate: ~5% (each farm unique)
Memory Usage: High (duplicate data)

With Rounding:

Redis Cache:
├─ weather:23.81_90.41 → Forecast A (shared by 25 farms)
├─ weather:23.82_90.42 → Forecast B (shared by 30 farms)
└─ ... (4 entries total)

Cache Hit Rate: ~95% (farms share keys)
Memory Usage: Low (no duplication)

Performance: 20x faster (cache hits vs API calls)

1. Is 1 km Precision Enough?

Weather Variation at Different Scales:

├─ 100 km: Different weather systems
├─ 10 km: Slight variations (hills, water bodies)
├─ 1 km: Essentially identical
└─ 100 m: No measurable difference

Our Rounding: 1.1 km precision

Farm Sizes:
├─ Small farm: 1-5 hectares (100m × 100m)
├─ Medium farm: 10-50 hectares (300m × 300m)
├─ Large farm: 100+ hectares (1km × 1km)

Conclusion: 1 km precision is MORE than enough

2. Real-World Example

Two Farms:
├─ Farm A: 23.8103, 90.4125 (exact)
├─ Farm B: 23.8156, 90.4189 (exact)
└─ Distance: ~650 meters apart

Weather Difference:
├─ Temperature: ±0.1°C (negligible)
├─ Humidity: ±1% (negligible)
├─ Rainfall: Same (unless very localized storm)
└─ Wind: Same direction and speed

Conclusion: Sharing forecast is accurate

2. Growth Handling

System Growth:

100 farms → 10,000 farms

Without Rounding:
- 10,000 unique API calls
- Impossible (rate limits)
- Need expensive API tier

With Rounding:
- ~400 unique location keys (assuming distribution)
- Still under free tier
- Scales naturally

Conclusion: Design supports 100x growth

Future Enhancements: Add SMS notifications and upport custom notification times per user

Authentication Domain

Problem: Secure user access with token-based authentication

Architecture:

┌─────────────────────────────────────────────────────────┐
│                  AUTHENTICATION FLOW                     │
└─────────────────────────────────────────────────────────┘

POST /api/v1/auth/register
    │
    ├─▶ Validate Input (email, password strength)
    │
    ├─▶ Hash Password (bcrypt, cost=10)
    │
    ├─▶ Store User in PostgreSQL
    │
    └─▶ Return User ID

POST /api/v1/auth/login
    │
    ├─▶ Fetch User by Email
    │
    ├─▶ Compare Password Hash
    │
    ├─▶ Generate JWT Access Token (15 min expiry)
    │
    ├─▶ Generate Refresh Token (7 days, stored in DB)
    │
    └─▶ Return Both Tokens

POST /api/v1/auth/refresh
    │
    ├─▶ Validate Refresh Token from DB
    │
    ├─▶ Check Expiry & Revocation
    │
    ├─▶ Generate New Access Token
    │
    └─▶ Return New Token

What We Did:

Implemented JWT-based authentication with short-lived access tokens
Stored refresh tokens in PostgreSQL for revocation capability
Used bcrypt for password hashing
Created middleware to protect routes

Why This Way:

JWT for stateless auth: No session storage needed, scales horizontally
Refresh tokens in DB: Allows logout/revocation (pure JWT can't be revoked)
Short access token expiry: Limits damage if token is stolen
Bcrypt over SHA256: Designed for passwords, has built-in salt, adjustable cost

Alternative Approaches:

Session-based auth: Requires Redis/DB lookup on every request (slower)
OAuth2: Overkill for this use case, adds complexity
API Keys: Less secure, no expiration, harder to rotate

Future Enhancements:

Add 2FA (TOTP)
Implement rate limiting on login attempts
Add password reset via email
Support OAuth2 for social login

Farm Management Domain

Problem: Users need to manage multiple farms with GPS coordinates

Architecture:

┌─────────────────────────────────────────────────────────┐
│                    FARM MANAGEMENT                       │
└─────────────────────────────────────────────────────────┘

User (1) ──────── (N) Farm
    │                  │
    │                  ├─ ID (UUID)
    │                  ├─ Name
    │                  ├─ Latitude
    │                  ├─ Longitude
    │                  ├─ Location Key (for weather API)
    │                  └─ Timestamps
    │
    └─────────────────▶ Weather Alerts (N)
                           │
                           ├─ Metric (temp/rain/humidity/wind)
                           ├─ Operator (<, >, =)
                           ├─ Value (threshold)
                           └─ Is Enabled

API Flow:
POST /api/v1/farms
    │
    ├─▶ Extract User ID from JWT
    │
    ├─▶ Validate Coordinates (-90 to 90, -180 to 180)
    │
    ├─▶ Generate Location Key (lat_lon hash)
    │
    ├─▶ Store in PostgreSQL
    │
    └─▶ Return Farm Object

What We Did:

Created one-to-many relationship: User → Farms → Weather Alerts
Used UUIDs for IDs (better for distributed systems)
Added location_key for efficient weather API caching
Implemented ownership verification (users can only access their farms)

Future Enhancements:

Add farm boundaries (polygon coordinates)
Support multiple crops per farm
Add soil test history tracking
Implement farm sharing (multiple users per farm)

Why RabbitMQ?

Requirements:

Send emails asynchronously
Retry failed emails
Simple pub/sub

Why Redis for Caching?

Problem:

OpenWeather API: 1000 calls/day free tier
100 farms × 24 checks/day = 2400 calls (over limit!)

Solution:

Cache forecasts for 3 hours (weather update frequency)
Nearby farms share same cache key
Result: ~8 API calls/day for 100 farms

Prerequisites

# Required
- Go 1.25+
- Python 3.8+
- PostgreSQL 14+
- Redis 6+
- RabbitMQ 3.9+

Quick Setup

1. Clone & Configure

git clone https://github.com/yourusername/farm-fusion.git
cd farm-fusion
cp .env.example .env
# Edit .env with your credentials

2. Database Setup

createdb farm_fusion
psql -d farm_fusion -f migrations/*.up.sql

3. Start ML Service

cd ml_service
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python train_models.py  # First time only
uvicorn app.main:app --host 0.0.0.0 --port 8000

4. Start Go Backend

go mod download
go build -o bin/api cmd/api/main.go
./bin/api

5. Start Background Services

# Terminal 1
go build -o scheduler cmd/scheduler/main.go
./scheduler

# Terminal 2
go build -o worker cmd/worker/main.go
./worker

Mistakes & Lessons

Mistake 1: Over-engineering Early

Initially wanted to use gRPC, microservices everywhere
Learned: Start simple, add complexity when needed

Mistake 2: Not Planning Database Schema

Had to add location_key column later for caching
Learned: Think about access patterns upfront
Migrations are painful, get it right first time

Mistake 3: Ignoring Error Handling

Early code had generic error messages

- Learned: Specific errors help debugging

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development Workflow

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Author

Md. Soyaib Rahman Zihad

Farm Fusion

Why This Project Matters

The Problem

The Solution

Real-World Impact

- Data-Driven: ML models trained on 2,200+ agricultural data points.

System Architecture

High-Level Overview

Architecture Principles

Domain Deep Dive

ML Recommendation Domain

Random Forest Solution

4. Weather Notification Domain

Why NOT Instant/Real-Time Notifications?

Technical Challenges

User Experience Challenges

Business Logic Challenges

2. Why Rounding Latitude/Longitude (Location Key)?

The Design Decision

Why Round Coordinates?

Technical Challenges

Authentication Domain

Farm Management Domain

Why RabbitMQ?

Why Redis for Caching?

Prerequisites

Quick Setup

1. Clone & Configure

2. Database Setup

3. Start ML Service

4. Start Go Backend

5. Start Background Services

Mistakes & Lessons

- Learned: Specific errors help debugging

Contributing

Development Workflow

License

Author

Built With

Updates