Harvest AI

🌱 Inspiration

Robots don’t fail because models are weak.
They fail because training data doesn’t cover edge cases.

Most perception pipelines train on flat, single-view images.
That works in labs — but it breaks in the real world.

Rare objects blend into environments.
Shadows become part of objects.
Labels merge into boxes.

We built Harvest AI to turn world models into a data factory:

🌍 Real locations
🧱 Explorable 3D worlds
📸 Consistent multi-view imagery
🧠 Verified edge-case labels at scale

🧠 What It Does

Harvest AI generates edge-case training data from world models using a multi-stage pipeline:

🗺️ Location Capture via Google Maps

Users click anywhere on a photorealistic 3D Google Map. The system captures satellite imagery from four cardinal directions (0°, 90°, 180°, 270°) using the Google Maps Static API and resolves place names via the Geocoding API.

🌍 World Model Generation

Captured images and azimuth metadata are uploaded to World Labs, which generates an explorable 3D world from the multi-view input.

🔄 Multi-View Extraction

The world’s panoramic render is projected into multiple perspective views using yaw and pitch sweeps, producing near-360° coverage.

🎯 Object Detection with Judge Verification

A reference object image is matched against every extracted view using GPT-5.2 Vision routed through the Keywords AI gateway.
Each bounding box is verified by a second GPT-5.2 judge call that confirms, corrects, or removes detections — up to two correction iterations per box.

🧩 Optional Product Placement

A product image can be composited into the scene using Gemini for AI-aware placement with proper lighting and perspective, or a deterministic ground-plane fallback.

⚙️ How It Works

🗺️ Google Maps → Satellite Capture

The frontend renders Google Maps 3D using the gmp-map-3d web component. On click, the app fetches four directional satellite images via the Static Maps API and resolves the location name via the Geocoding API.

🌍 World Labs Generation

Each satellite image is uploaded via signed URLs to the World Labs API. Explicit azimuth angles preserve view consistency. The backend polls the World Labs operation endpoint until the world is ready.

📐 Panorama → Perspective Views

The panoramic render is downloaded and projected into configurable perspective views using yaw and pitch sweep parameters.

🔍 Keywords AI Gateway + Detection Pipeline

All GPT-5.2 Vision calls are routed through the Keywords AI gateway, providing centralized logging, token tracking, latency metrics, and workflow tracing. Detection prompts return bounding boxes as structured JSON.

🧑‍⚖️ Judge Iteration System

Each bounding box is verified by a judge agent (GPT-5.2 via Keywords AI). The judge receives:

Reference object image
Cropped bounding-box region
Full scene with the box drawn

The judge returns:

CORRECT → keep
INCORRECT → corrected coordinates for re-judging
NOT_FOUND → remove false positive

Runs up to two iterations per detection.

📡 Real-Time Streaming

The entire pipeline streams progress to the frontend via Server-Sent Events (SSE). A Gateway Log UI panel shows every LLM call in real time:

Call type (DETECT / JUDGE)
Model
Latency
Token counts
Color-coded judge verdicts

🗄️ Supabase Storage

Generated worlds, extracted images, and metadata are stored in Supabase. Images are organized by world ID, and world records persist in Postgres for reuse across sessions.

🎨 Lovable

Used for rapid frontend scaffolding and UI prototyping.

🚧 Challenges We Ran Into

Maintaining view consistency across multi-image world generation and panorama-to-perspective extraction
Handling base64-encoded images reliably through the Keywords AI gateway
Building a judge iteration loop that re-crops and re-judges without compounding errors
Real-time SSE streaming for long-running pipelines with dozens of LLM calls
Dependency conflicts between keywordsai-tracing and OpenTelemetry on Python 3.9

🏆 Accomplishments We’re Proud Of

End-to-end pipeline: real-world location → verified training dataset
Judge system that catches and corrects bad bounding boxes without retraining
Full observability of every LLM call via Keywords AI
Real-time Gateway Log UI with token usage, latency, and verdicts
All outputs stored and reusable via Supabase
Inline prompt fallback for immediate usability without managed prompts

🧰 Built With

World Labs API
Google Maps Platform (Maps JavaScript API, Static Maps API, Geocoding API)
Keywords AI (Gateway, Tracing, Prompt Management)
OpenAI GPT-5.2 Vision
Google Gemini
Supabase (Postgres + Storage)
Lovable
React, Vite, Tailwind CSS v4
FastAPI, Python
Server-Sent Events (SSE)

📚 What We Learned

Edge cases are a data problem, not a model problem.
Multi-view context is the missing layer between simulation and reality.

The detect → judge verification pattern generalizes beyond robotics.
Any vision pipeline benefits from a second-pass verifier.

Centralized LLM gateways are essential. Without tracing and logging, debugging multi-agent pipelines is guesswork.

🚀 What’s Next

True 3D mesh ingestion
Physics-aware product placement for robotic manipulation tasks
Continuous video-based multi-view synthesis
Deeper integration with robotics simulators and perception training pipelines
Keywords AI managed prompts for A/B testing detection and judge logic

Built With

axios
fastapi
framer-motion
google-gemini
google-genai
javascript
keywords-ai
lovable
numpy
onnx-runtime
openai-api
openai-gpt-4o
openai-gpt-5.2-vision
opencv
pillow
python
react
react-three/drei
react-three/fiber
rembg
shadcn-ui
sse
supabase
tailwind-css
tailwindcss
three.js
trae
typescript
ultralytics
uvicorn
vite
worldlabs
worldlabs-api

Submitted to

Keywords AI Hackathon - UIUC
- Winner 1st place

Created by

I worked on the generation of Advanced World Models and integrating the data pipeline to place objects within it.

Krish Golcha
building
Object detection system that identifies speecific objects from the 3D world using a vision model. It takes a reference image of an object, finds the corresponding instances, and visualizes the box overlays.

Shubham Dey
sumanthk123 Kalluru
Chidera Ibe

Updates

Krish Golcha started this project — Feb 01, 2026 11:08 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.