VisionCart

VisionCart: AI Voice Shopping Assistant

Inspiration

Online shopping today is highly visual and manual. Users must type queries, browse product listings, apply filters, and compare items themselves. This process becomes even harder for users with accessibility needs, multitasking users, or those who prefer natural conversation over traditional search interfaces.

VisionCart was inspired by the idea of making shopping conversational and intelligent. Instead of typing keywords, users can simply speak naturally and the system understands their intent, retrieves relevant products, and guides them through the shopping process.

The goal was to build a real-time voice-powered shopping assistant that behaves more like a helpful store associate than a search engine.

⸻

What It Does

VisionCart allows users to: • Speak natural shopping queries • Receive AI-powered product recommendations • Ask follow-up questions conversationally • Interrupt the assistant naturally while it speaks • Experience a human-like voice interaction User: "Find me running shoes under $100."

Assistant: "I found three popular options. Nike Revolution, Adidas Duramo, and Puma Velocity. Would you like lightweight, cushioned, or trail running shoes?"

How We Built It

VisionCart combines voice processing, AI reasoning, and product retrieval into a single pipeline.

Voice Input

User speech is captured using the browser microphone and converted to text using speech recognition.

Intent Understanding The spoken query is sent to an AI model that interprets the shopping intent.
Product Retrieval

The backend queries product sources (such as retailer APIs) to fetch relevant items.

The system performs dual retrieval: 1. Keyword product search 2. AI reasoning over product attributes

This hybrid approach improves recommendation quality.

Conversational AI

The AI model generates a natural response describing the results.

Voice Response

The AI response is converted back to speech using text-to-speech synthesis, allowing the assistant to speak naturally.

Challenges We Ran Into

Voice Latency

Real-time voice systems must respond quickly. Early versions had delays because speech recognition, AI reasoning, and product retrieval all added latency.

We optimized the pipeline by: • Streaming responses • Reducing API round trips • Using faster AI models

Handling Interruptions

Humans often interrupt voice assistants mid-sentence. Handling this required careful control of: • audio playback • microphone listening states • conversation context

We implemented interruption handling so users can speak anytime.

Intent Ambiguity

User queries are often vague.

API Rate Limits

External AI APIs have request limits, so we had to design fallback strategies and optimize requests.

What We Learned

Building VisionCart taught us several important lessons: • Voice UX design is very different from text interfaces • Latency optimization is critical in conversational systems • AI assistants must handle ambiguity gracefully • Streaming responses significantly improve user experience • Combining retrieval systems with AI reasoning leads to better recommendations

Future Improvements

Planned enhancements include: • personalized shopping recommendations • multi-store product comparison • visual product cards alongside voice responses • multilingual voice support • real-time price tracking

Ultimately, VisionCart aims to make shopping faster, more natural, and more accessible through conversational AI.

Built With

Updates

Harshini Raja started this project — Mar 16, 2026 08:03 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.