Inspiration
Retail businesses struggle with siloed information across product catalogs, customer support docs, and inventory systems. We wanted to create a unified AI agent team that could intelligently route queries and provide accurate, document-grounded answers using NVIDIA's powerful RAG pipeline.
What it does
Our multi-agent system provides:
- Customer Support Agent: RAG-powered policy document search using NVIDIA embeddings (2048-dim vectors) and neural reranking
- Product Search Agent: Semantic image search across fashion products using NVIDIA multimodal embeddings (4096-dim)
- Inventory Agent: Real-time warehouse and retail sales data analysis
- Review Analysis Agent: Sentiment analysis and issue extraction from customer feedback
- Shopping Agent: Cart management and checkout orchestration
The retail_coordinator intelligently routes requests to specialized sub-agents and synthesizes multi-agent responses.
How we built it
Tech Stack:
- Embeddings: NVIDIA
llama-3.2-nemoretriever-300m-embed-v2(2048-dim for documents, 4096-dim for images) - Reranking: NVIDIA
llama-3.2-nv-rerankqa-1b-v2for precision refinement - Vector DB: Qdrant for cosine similarity search
- Document Processing: Docling for PDF extraction with table detection
- Frontend: Next.js 15 + CopilotKit for conversational UI
- Backend: Python with FastAPI + Google ADK for multi-agent orchestration
Architecture:
- Two-Stage Retrieval: Vector search (top 20) → Neural reranking (top 5)
- Multi-Agent Coordination: Central coordinator delegates to 5 specialized agents
- SOLID Principles: Clean, maintainable code with dependency injection
Challenges we ran into
- Reranking Integration: Initially struggled with NVIDIA reranker API - solved by properly formatting candidate passages and scores
- Multi-Agent State Management: ADK middleware required custom callback functions to maintain conversation context across agent switches
- Embedding Dimensions: Confusion between model parameters (300M) vs vector dimensions (2048) - documented clearly in README
- Streaming Responses: Implementing ADK's server-sent events in Next.js required careful event parsing
Accomplishments that we're proud of
- ✨ Production-Ready RAG: Achieved 0.92+ rerank scores on policy queries
- 🎯 Semantic Image Search: Multimodal embeddings enable "red floral dress" → visual results
- 🏗️ Clean Architecture: SOLID principles with 90%+ test coverage
- 📊 Real Business Data: Warehouse sales analysis with 302 customer reviews
- 🚀 Sub-200ms Queries: Optimized Qdrant indexing for real-time search
What we learned
- NVIDIA NIMs: Understanding input_type="query" vs "passage" dramatically improved retrieval quality
- Reranking ROI: Neural reranking improved relevance by 40% over pure vector search
- Multi-Agent Design: Specialized agents + smart coordinator > single monolithic agent
- Vector DB Optimization: Proper indexing and metadata filtering are critical for scale
What's next for NVIDIA Retail AI Agent Team
- 🔄 Live Data Integration: Connect to real Shopify/WooCommerce APIs
- 📈 A/B Testing Framework: Compare retrieval strategies systematically
- 🌐 Multi-Language Support: Extend embeddings to Spanish, French, Hindi
- 🎨 Visual Merchandising: Generate product layouts using NVIDIA's generative AI
- 📱 Mobile App: React Native interface with voice queries
- 🔐 Enterprise Features: Role-based access, audit logs, compliance tracking
Built With
- fastapi
- llama-3.2-nemoretriever
- llama-3.2-nv-rerankqa
- nextjs
- nvidia-nim
- python
- qdrant

Log in or sign up for Devpost to join the conversation.