RetailForge — Devpost Submission

Your AI Personal Shopper — that actually shops. A production-grade multi-agent retail intelligence system: a department-store storefront with a concierge that searches, recommends, builds discounted kits, places orders, and issues refunds — with safe, gated access to live data.

Live demo: https://retailforge-frontend-awghszkm2a-uc.a.run.app


Inspiration

Retail "AI assistants" today mostly talk. You ask for a waterproof jacket, you get a paragraph back — and then you still do all the work: hunt through the catalog, compare prices, check whether it's in stock, find a promo code, and check out yourself.

We wanted to build a concierge that takes real, correct actions on a live store: find products by meaning (not keywords), assemble a multi-item kit for an activity, apply the best valid discount, place the order, and handle returns — the things a great in-store personal shopper actually does.

The catch: the moment you let an LLM touch your production database, you're one hallucination away from a destructive write, and you have no clean way to audit what the model did. So the second half of the inspiration was just as important as the first: how do you give agents real power over data without giving them the keys to the kingdom?

What it does

RetailForge is a full department-store storefront with an embedded Personal Shopper. A shopper chats once; behind the scenes a team of specialist agents do the work end to end:

  • Find — natural-language semantic search over the catalog using vector embeddings.
  • Recommend — "customers also bought", trending items, and history-aware picks.
  • Support — order status, cancellations, and refunds.
  • Build a kit — assemble a multi-category bundle for an activity (e.g. a weekend camping trip), check live inventory, and auto-apply the best valid promotion.
  • Check out — actually place the order and decrement inventory.

Every answer comes back as generative UI — product grids, a kit-builder card, an order confirmation — with live "Add to Bag" actions wired to the same state as the storefront.

How we built it

Frontend. Next.js 15 + React 19 + Tailwind, with CopilotKit driving the concierge over the AG-UI protocol (server-sent events). A separate read-only API serves the catalog, product detail, reviews, and order history.

Agents. A root RetailForgeConcierge built on Google ADK classifies shopper intent and hands off (transfer_to_agent) to one of four specialists: ProductAdvisor, RecommendationAgent, CustomerSupport, and BillingAgent. The LLM does the strategy; deterministic money math (pricing, discount validation, inventory) stays in native Python — not left to the model.

The key design decision — MCP Toolbox as a safety boundary. Agents never open a database connection. Every data operation is a declared tool in tools.yaml (24 tools, grouped into four toolsets, one per agent), served by Google's MCP Toolbox for Databases running as a distroless binary on Cloud Run. An agent can only call what's declared — no ad-hoc queries, no DROP. Every write is a named, logged, auditable tool call.

Data & AI. MongoDB Atlas with Atlas Vector Search. Products are embedded at seed time with gemini-embedding-001 (3072-d, cosine); queries are embedded at run time and matched with $vectorSearch plus metadata filters (category, brand, price). Agents reason with Gemini 2.5 Flash.

Infrastructure & CI/CD. Four Cloud Run services (frontend, backend, read-api, toolbox) provisioned by Terraform, with secrets in Secret Manager and images in Artifact Registry. GitHub Actions runs the pipeline — test (ruff + pytest on mongomock) → Cloud Build (three images, commit-SHA tagged) → terraform apply — authenticating to GCP with Workload Identity Federation, so there are no service-account JSON keys in the repo.

Challenges we ran into

  • Keeping agents away from the database. Solved cleanly by routing 100% of data access through the MCP Toolbox tool layer instead of giving agents a driver.
  • Vector search relevance. Getting good results meant tuning embedding task-types (document vs. query), metadata filters, and top-k.
  • Distroless toolbox on Cloud Run. The Toolbox image has no shell, so we invoke the binary directly on port 8080 — and had to make the Toolbox service publicly invokable before the backend boots and tries to load its tools.
  • Multi-service boot ordering under Terraform, so dependencies come up in the right order.

Accomplishments that we're proud of

  • A concierge that performs real transactions — orders, inventory writes, refunds — not a scripted demo.
  • A genuinely safe agent-to-data architecture: every action is auditable by construction.
  • A polished, production-grade retail UX with generative UI, not just a chat box.
  • One-command, keyless deploys: terraform apply brings up the whole stack; CI/CD ships it on every push with no secrets in the repo.

What we learned

  • Let the LLM strategize, but keep correctness-critical math in deterministic code.
  • A declared-tool layer (MCP Toolbox) is the cleanest way to make LLM data access both powerful and safe — it doubles as the API contract between model and system of record.
  • Generative UI beats plain chat: putting the action where the answer is changes the whole experience.
  • Keyless WIF + Terraform makes cloud deploys repeatable and secret-free.

What's next

  • Tighten IAM to authenticated service-to-service calls and add a VPC connector for static egress IPs to Atlas.
  • Personalization driven by real session and purchase history.
  • Payment + fulfillment integration, and A/B testing of agent strategies.
  • Full observability: trace every agent hand-off and tool call.

Built With

  • ag-ui
  • copilotkit
  • google-adk
  • mcp-toolbox
  • next.js
Share this project:

Updates