Inspiration

Retail businesses struggle with siloed information across product catalogs, customer support docs, and inventory systems. We wanted to create a unified AI agent team that could intelligently route queries and provide accurate, document-grounded answers using NVIDIA's powerful RAG pipeline.

What it does

Our multi-agent system provides:

  • Customer Support Agent: RAG-powered policy document search using NVIDIA embeddings (2048-dim vectors) and neural reranking
  • Product Search Agent: Semantic image search across fashion products using NVIDIA multimodal embeddings (4096-dim)
  • Inventory Agent: Real-time warehouse and retail sales data analysis
  • Review Analysis Agent: Sentiment analysis and issue extraction from customer feedback
  • Shopping Agent: Cart management and checkout orchestration

The retail_coordinator intelligently routes requests to specialized sub-agents and synthesizes multi-agent responses.

How we built it

Tech Stack:

  • Embeddings: NVIDIA llama-3.2-nemoretriever-300m-embed-v2 (2048-dim for documents, 4096-dim for images)
  • Reranking: NVIDIA llama-3.2-nv-rerankqa-1b-v2 for precision refinement
  • Vector DB: Qdrant for cosine similarity search
  • Document Processing: Docling for PDF extraction with table detection
  • Frontend: Next.js 15 + CopilotKit for conversational UI
  • Backend: Python with FastAPI + Google ADK for multi-agent orchestration

Architecture:

  1. Two-Stage Retrieval: Vector search (top 20) → Neural reranking (top 5)
  2. Multi-Agent Coordination: Central coordinator delegates to 5 specialized agents
  3. SOLID Principles: Clean, maintainable code with dependency injection

Challenges we ran into

  1. Reranking Integration: Initially struggled with NVIDIA reranker API - solved by properly formatting candidate passages and scores
  2. Multi-Agent State Management: ADK middleware required custom callback functions to maintain conversation context across agent switches
  3. Embedding Dimensions: Confusion between model parameters (300M) vs vector dimensions (2048) - documented clearly in README
  4. Streaming Responses: Implementing ADK's server-sent events in Next.js required careful event parsing

Accomplishments that we're proud of

  • Production-Ready RAG: Achieved 0.92+ rerank scores on policy queries
  • 🎯 Semantic Image Search: Multimodal embeddings enable "red floral dress" → visual results
  • 🏗️ Clean Architecture: SOLID principles with 90%+ test coverage
  • 📊 Real Business Data: Warehouse sales analysis with 302 customer reviews
  • 🚀 Sub-200ms Queries: Optimized Qdrant indexing for real-time search

What we learned

  • NVIDIA NIMs: Understanding input_type="query" vs "passage" dramatically improved retrieval quality
  • Reranking ROI: Neural reranking improved relevance by 40% over pure vector search
  • Multi-Agent Design: Specialized agents + smart coordinator > single monolithic agent
  • Vector DB Optimization: Proper indexing and metadata filtering are critical for scale

What's next for NVIDIA Retail AI Agent Team

  • 🔄 Live Data Integration: Connect to real Shopify/WooCommerce APIs
  • 📈 A/B Testing Framework: Compare retrieval strategies systematically
  • 🌐 Multi-Language Support: Extend embeddings to Spanish, French, Hindi
  • 🎨 Visual Merchandising: Generate product layouts using NVIDIA's generative AI
  • 📱 Mobile App: React Native interface with voice queries
  • 🔐 Enterprise Features: Role-based access, audit logs, compliance tracking

Built With

  • fastapi
  • llama-3.2-nemoretriever
  • llama-3.2-nv-rerankqa
  • nextjs
  • nvidia-nim
  • python
  • qdrant
Share this project:

Updates