CartPilot

Architecture Diagram
landing page
product discovery
add to cart
order confirmation
order success

Inspiration

Cart Pilot was born from a simple frustration: watching someone struggle to find a product online, clicking through menus and filters, muttering "I just want to tell someone what I'm looking for." That moment sparked my vision—what if shopping could be as natural as conversation? I didn't want another chatbot; I wanted a system where specialized AI agents work together, each an expert in their domain, orchestrated by a master coordinator. The goal was to showcase the power of Google's Agent Development Kit (ADK) and the Agent-to-Agent (A2A) Protocol while solving a real problem: making e-commerce feel as natural as talking to a helpful salesperson.

What it does

Cart Pilot transforms the traditional e-commerce experience into a natural conversation. Instead of navigating complex interfaces, users simply tell the AI what they want—"Find me running shoes" or "Add the blue ones to my cart"—and intelligent agents orchestrate the entire shopping journey from product discovery to order placement. The system uses a hierarchical agent architecture: a Shopping Agent that routes requests, a Product Discovery Agent for semantic and visual search, a Cart Agent for cart management, a Checkout Agent for orders, a Payment Agent for secure payments, and a Customer Service Agent for support. All agents share state seamlessly through state["current_results"], state["cart"], and other keys, enabling seamless handoffs without explicit communication.

How we built it

My journey began with a monolithic agent that tried to do everything and predictably struggled, teaching me that agents, like humans, work better when they specialize. The breakthrough came with Google ADK's hierarchical agent pattern, allowing each agent to be an expert in its domain. I chose Google Cloud Platform for its seamless AI integration—Cloud Run for serverless containers hosting my FastAPI backend and Next.js frontend, Cloud SQL with pgvector for semantic search, Vertex AI for Gemini models and multimodal embeddings, Secret Manager for security, and Artifact Registry for deployments. When I discovered the Agent-to-Agent (A2A) Protocol, I was initially skeptical, but implementing it transformed everything: standardized communication via JSON-RPC 2.0, streaming support for real-time updates, structured artifacts for complex data, and session continuity through contextId made agent communication feel natural. I migrated from in-memory session storage to database-backed sessions using DatabaseSessionService, ensuring sessions persist across restarts and making the system production-ready.

Challenges we ran into

My biggest challenges included agent coordination—solving how Payment Agent completion triggers automatic Checkout Agent transfer without losing context. State synchronization was another hurdle, requiring careful design to prevent race conditions when multiple agents accessed shared state simultaneously; I solved this by establishing clear ownership of state keys like current_results (set by Product Discovery, read by Cart) and cart (set by Cart, read by Checkout). Streaming performance issues emerged as I experienced laggy responses, which I fixed by implementing incremental updates where text streams character by character and artifacts update immediately when received. Visual search optimization required caching embeddings and using pgvector for fast similarity search to avoid expensive processing on every upload. Finally, frontend complexity became a problem when my Chatbox component grew to 1,300+ lines; I refactored it using Next.js best practices, extracting custom hooks and focused components, reducing it to 201 lines.

Accomplishments that we're proud of

The magical moments came when I saw my first complete multi-agent flow work end-to-end—a user finding products, adding to cart, and completing checkout with seamless agent handoffs. When database-backed sessions persisted across restarts, I knew I had something production-ready. The visual search breakthrough was particularly exciting, successfully matching uploaded images to products using multimodal embeddings. I'm proud of proving that conversational commerce powered by multi-agent AI is not just possible, but practical, scalable, and ready for production. The refactoring of my frontend from a monolithic component to a well-organized, maintainable architecture was also a significant achievement.

What we learned

I learned that hierarchical delegation works—orchestrator agents should delegate, not execute, and sub-agents should be domain experts with clear boundaries. Shared state enables coordination, with centralized session state simplifying agent handoffs when state keys are well-documented and ownership patterns are clear. Protocols matter—A2A Protocol made everything easier by abstracting complexity and enabling innovation. Google Cloud Run enables rapid iteration by letting me focus on application logic instead of infrastructure management. Streaming UX requires careful design, with incremental updates feeling faster than batch updates. Most importantly, I learned that ADK multi-agent systems are powerful when designed correctly, that starting simple and adding complexity gradually works, and that conversational interfaces feel natural—proving that the future of e-commerce is conversational.

What's next for CartPilot

Cart Pilot is just the beginning. I see potential for enhanced visual search with better image matching and style recommendations, a voice interface for natural voice interactions, multi-language support to shop in any language, advanced ML-based personalization for recommendations, and social features like sharing carts, wishlists, and recommendations.

Built With

a2a
adk
cloudrun
cloudsql
fastapi
nextjs
vertex-ai

Updates

james mwai started this project — Nov 10, 2025 02:22 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.