VisionCart: AI Voice Shopping Assistant
Inspiration
Online shopping today is highly visual and manual. Users must type queries, browse product listings, apply filters, and compare items themselves. This process becomes even harder for users with accessibility needs, multitasking users, or those who prefer natural conversation over traditional search interfaces.
VisionCart was inspired by the idea of making shopping conversational and intelligent. Instead of typing keywords, users can simply speak naturally and the system understands their intent, retrieves relevant products, and guides them through the shopping process.
The goal was to build a real-time voice-powered shopping assistant that behaves more like a helpful store associate than a search engine.
⸻
What It Does
VisionCart allows users to: • Speak natural shopping queries • Receive AI-powered product recommendations • Ask follow-up questions conversationally • Interrupt the assistant naturally while it speaks • Experience a human-like voice interaction User: "Find me running shoes under $100."
Assistant: "I found three popular options. Nike Revolution, Adidas Duramo, and Puma Velocity. Would you like lightweight, cushioned, or trail running shoes?"
How We Built It
VisionCart combines voice processing, AI reasoning, and product retrieval into a single pipeline.
- Voice Input
User speech is captured using the browser microphone and converted to text using speech recognition.
Intent Understanding The spoken query is sent to an AI model that interprets the shopping intent.
Product Retrieval
The backend queries product sources (such as retailer APIs) to fetch relevant items.
The system performs dual retrieval: 1. Keyword product search 2. AI reasoning over product attributes
This hybrid approach improves recommendation quality.
- Conversational AI
The AI model generates a natural response describing the results.
- Voice Response
The AI response is converted back to speech using text-to-speech synthesis, allowing the assistant to speak naturally.
Challenges We Ran Into
- Voice Latency
Real-time voice systems must respond quickly. Early versions had delays because speech recognition, AI reasoning, and product retrieval all added latency.
We optimized the pipeline by: • Streaming responses • Reducing API round trips • Using faster AI models
- Handling Interruptions
Humans often interrupt voice assistants mid-sentence. Handling this required careful control of: • audio playback • microphone listening states • conversation context
We implemented interruption handling so users can speak anytime.
- Intent Ambiguity
User queries are often vague.
- API Rate Limits
External AI APIs have request limits, so we had to design fallback strategies and optimize requests.
What We Learned
Building VisionCart taught us several important lessons: • Voice UX design is very different from text interfaces • Latency optimization is critical in conversational systems • AI assistants must handle ambiguity gracefully • Streaming responses significantly improve user experience • Combining retrieval systems with AI reasoning leads to better recommendations
Future Improvements
Planned enhancements include: • personalized shopping recommendations • multi-store product comparison • visual product cards alongside voice responses • multilingual voice support • real-time price tracking
Ultimately, VisionCart aims to make shopping faster, more natural, and more accessible through conversational AI.
Log in or sign up for Devpost to join the conversation.