NVIDIA Retail AI Agent Team

Multi-Agent Architecture Visualization
CoverPage
Document RAG Pipeline Visualization
Image Search Results Showcase
Document RAG Pipeline - Detailed Flow
Professional layered architecture
Deployment

Inspiration

Retail businesses struggle with siloed information across product catalogs, customer support docs, and inventory systems. We wanted to create a unified AI agent team that could intelligently route queries and provide accurate, document-grounded answers using NVIDIA's powerful RAG pipeline.

What it does

Our multi-agent system provides:

Customer Support Agent: RAG-powered policy document search using NVIDIA embeddings (2048-dim vectors) and neural reranking
Product Search Agent: Semantic image search across fashion products using NVIDIA multimodal embeddings (4096-dim)
Inventory Agent: Real-time warehouse and retail sales data analysis
Review Analysis Agent: Sentiment analysis and issue extraction from customer feedback
Shopping Agent: Cart management and checkout orchestration

The retail_coordinator intelligently routes requests to specialized sub-agents and synthesizes multi-agent responses.

How we built it

Tech Stack:

Embeddings: NVIDIA llama-3.2-nemoretriever-300m-embed-v2 (2048-dim for documents, 4096-dim for images)
Reranking: NVIDIA llama-3.2-nv-rerankqa-1b-v2 for precision refinement
Vector DB: Qdrant for cosine similarity search
Document Processing: Docling for PDF extraction with table detection
Frontend: Next.js 15 + CopilotKit for conversational UI
Backend: Python with FastAPI + Google ADK for multi-agent orchestration

Architecture:

Two-Stage Retrieval: Vector search (top 20) → Neural reranking (top 5)
Multi-Agent Coordination: Central coordinator delegates to 5 specialized agents
SOLID Principles: Clean, maintainable code with dependency injection

Challenges we ran into

Reranking Integration: Initially struggled with NVIDIA reranker API - solved by properly formatting candidate passages and scores
Multi-Agent State Management: ADK middleware required custom callback functions to maintain conversation context across agent switches
Embedding Dimensions: Confusion between model parameters (300M) vs vector dimensions (2048) - documented clearly in README
Streaming Responses: Implementing ADK's server-sent events in Next.js required careful event parsing

Accomplishments that we're proud of

✨ Production-Ready RAG: Achieved 0.92+ rerank scores on policy queries
🎯 Semantic Image Search: Multimodal embeddings enable "red floral dress" → visual results
🏗️ Clean Architecture: SOLID principles with 90%+ test coverage
📊 Real Business Data: Warehouse sales analysis with 302 customer reviews
🚀 Sub-200ms Queries: Optimized Qdrant indexing for real-time search

What we learned

NVIDIA NIMs: Understanding input_type="query" vs "passage" dramatically improved retrieval quality
Reranking ROI: Neural reranking improved relevance by 40% over pure vector search
Multi-Agent Design: Specialized agents + smart coordinator > single monolithic agent
Vector DB Optimization: Proper indexing and metadata filtering are critical for scale

What's next for NVIDIA Retail AI Agent Team

🔄 Live Data Integration: Connect to real Shopify/WooCommerce APIs
📈 A/B Testing Framework: Compare retrieval strategies systematically
🌐 Multi-Language Support: Extend embeddings to Spanish, French, Hindi
🎨 Visual Merchandising: Generate product layouts using NVIDIA's generative AI
📱 Mobile App: React Native interface with voice queries
🔐 Enterprise Features: Role-based access, audit logs, compliance tracking

Built With

fastapi
llama-3.2-nemoretriever
llama-3.2-nv-rerankqa
nextjs
nvidia-nim
python
qdrant

Updates

Yash Kavaiya started this project — Nov 01, 2025 12:15 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.