Inspiration E-commerce knows exactly why a cart is abandoned; physical retail is blind. We wanted to close this "Conversion Gap" by building an Empathic Analyst—AI that understands hesitation and intent in the real world without becoming a surveillance tool.
What it does Gemini Retail Agent transforms passive cameras into active sales assistants:
Zero-Config Starter Kit: Store owners upload shelf photos. Autonomous agents segment products, fetch metadata, and build a Digital Twin in minutes.
Live Behavioral Analysis: Using Gemini Multimodal Live API, it detects complex behaviors like "Price Hesitation" or "Ingredient Checking" in real-time.
Instant Action: Detects high hesitation and triggers dynamic discounts on nearby screens to close the sale.
How we built it Core AI: Gemini 3 Pro (via Multimodal Live API/WebSockets) for low-latency reasoning and Gemini 1.5 Pro for inventory generation.
Architecture: Serverless, Event-Driven setup with a React 19 frontend.
Context Engineering: We inject the store's "Digital Twin" into the 1M token window, allowing the model to recognize products by context, not just pixels.
Privacy: Client-side processing using "Soft Biometrics" (clothing/gait) and ephemeral IDs. No PII is ever stored.
Challenges we ran into Latency vs. Context: Balancing the massive 1M token context window with the need for sub-second responses.
Strict JSON Output: Forcing the LLM to output clean, parseable JSON from chaotic video footage required extensive "Negative Constraint" prompting.
Defining Privacy: Designing an architecture that analyzes behavior without crossing into biometric surveillance (GDPR/BIPA compliance).
Accomplishments that we're proud of Solving the Cold Start: The "Auto-Deploy" agent works—turning a photo into a database instantly.
Ethical AI: Proving we can extract deep sales insights without storing a single face.
True Multimodal Reasoning: The agent doesn't just see a customer holding a bottle; it understands they are reading the label.
What we learned Context is King: Giving Gemini the inventory data dramatically improved object recognition in low-res video.
WebSockets > Polling: The Live API is essential for the "feeling" of real-time interaction.
Agentic Workflows: Specialized agents (The Architect vs. The Observer) outperform a single generic prompt.
What's next for Gemini Retail Agent POS Integration: Closing the loop by matching visual predictions with actual receipts for self-improvement.
"Zeitgeist" Injection: Feeding real-time social media trends (e.g., viral recipes) into the context to explain sudden demand spikes.
Hyper-Local Search: Correlating store traffic with local Google Search queries (e.g., "buy Sriracha near me").
Log in or sign up for Devpost to join the conversation.