Inspiration E-commerce knows exactly why a cart is abandoned; physical retail is blind. We wanted to close this "Conversion Gap" by building an Empathic Analyst—AI that understands hesitation and intent in the real world without becoming a surveillance tool.

What it does Gemini Retail Agent transforms passive cameras into active sales assistants:

Zero-Config Starter Kit: Store owners upload shelf photos. Autonomous agents segment products, fetch metadata, and build a Digital Twin in minutes.

Live Behavioral Analysis: Using Gemini Multimodal Live API, it detects complex behaviors like "Price Hesitation" or "Ingredient Checking" in real-time.

Instant Action: Detects high hesitation and triggers dynamic discounts on nearby screens to close the sale.

How we built it Core AI: Gemini 3 Pro (via Multimodal Live API/WebSockets) for low-latency reasoning and Gemini 1.5 Pro for inventory generation.

Architecture: Serverless, Event-Driven setup with a React 19 frontend.

Context Engineering: We inject the store's "Digital Twin" into the 1M token window, allowing the model to recognize products by context, not just pixels.

Privacy: Client-side processing using "Soft Biometrics" (clothing/gait) and ephemeral IDs. No PII is ever stored.

Challenges we ran into Latency vs. Context: Balancing the massive 1M token context window with the need for sub-second responses.

Strict JSON Output: Forcing the LLM to output clean, parseable JSON from chaotic video footage required extensive "Negative Constraint" prompting.

Defining Privacy: Designing an architecture that analyzes behavior without crossing into biometric surveillance (GDPR/BIPA compliance).

Accomplishments that we're proud of Solving the Cold Start: The "Auto-Deploy" agent works—turning a photo into a database instantly.

Ethical AI: Proving we can extract deep sales insights without storing a single face.

True Multimodal Reasoning: The agent doesn't just see a customer holding a bottle; it understands they are reading the label.

What we learned Context is King: Giving Gemini the inventory data dramatically improved object recognition in low-res video.

WebSockets > Polling: The Live API is essential for the "feeling" of real-time interaction.

Agentic Workflows: Specialized agents (The Architect vs. The Observer) outperform a single generic prompt.

What's next for Gemini Retail Agent POS Integration: Closing the loop by matching visual predictions with actual receipts for self-improvement.

"Zeitgeist" Injection: Feeding real-time social media trends (e.g., viral recipes) into the context to explain sudden demand spikes.

Hyper-Local Search: Correlating store traffic with local Google Search queries (e.g., "buy Sriracha near me").

Share this project:

Updates