Inspiration

ShopNow Voice Agent was inspired by a common but costly problem in e-commerce customer support: support teams spend a huge amount of time handling repetitive Tier-1 customer queries that are important to customers but operationally routine for the business. In the project brief, ShopNow is presented as a fictional mid-sized D2C brand that processes more than 40,000 orders every month across India, relies on 35 human agents who are available only from 9 AM to 9 PM, and still faces an average wait time of 8 minutes, a first-contact resolution rate of 52%, and a CSAT score of 3.1/5.

That context made the problem feel very real. Most of the incoming issues are not unusual or deeply complex. Customers are typically asking where their order is, whether they can return an item, why a refund has not arrived, what happened to a payment, or why a delivery is delayed. These are exactly the kinds of issues that consume a support team’s time even though many of them can be resolved using existing order records and support policies.

The inspiration behind ShopNow was to build something more capable than a simple FAQ bot. We wanted to create a voice-based customer support agent that could understand real customer requests, work in a multilingual Indian support context, use structured business data, pull policy information when needed, and gracefully hand off to a human when the conversation became emotional or complex. The goal was not just automation for its own sake, but better support availability, lower wait times, and a more scalable support workflow.

Problem Statement

ShopNow Voice Agent solves the problem of overloaded and inefficient e-commerce customer support for repetitive Tier-1 issues. In a high-volume support environment, human agents end up spending too much time on routine interactions such as:

  • order status checks
  • return and refund requests
  • payment problems
  • delivery complaints
  • product-related questions

This creates several operational challenges:

  • long support wait times
  • limited service outside business hours
  • inconsistent response quality
  • reduced agent bandwidth for complex issues
  • poor customer experience during peak load
  • weak support coverage for multilingual users

The challenge is especially important in the ShopNow brief because the support operation must serve a large and diverse customer base across India. A text-only or English-only system would not be enough. The agent needs to listen, respond naturally, stay grounded in order and policy data, and know when to transfer the case to a human agent.

Solution Overview

ShopNow Voice Agent addresses this problem by acting as a real-time AI-powered voice support representative named Priya. It handles the core first-line support workflow from greeting to resolution.

When a customer starts a call, the system:

  1. creates a session for the conversation
  2. receives the customer’s spoken query
  3. converts speech into text
  4. classifies the customer’s intent
  5. extracts useful entities such as order ID or issue type
  6. scores sentiment to understand frustration or satisfaction
  7. retrieves structured order details from the database
  8. retrieves policy knowledge from support documents using RAG
  9. generates a short, empathetic, context-aware response
  10. converts the response back into speech
  11. continues this loop across multiple turns until the issue is resolved or escalated

If the customer becomes angry, repeatedly negative, or explicitly asks for a human, the system escalates the interaction and generates a structured handoff brief for a support agent. In this way, the solution does not try to replace humans completely. Instead, it automates the repetitive support layer and reserves human time for situations where empathy, judgment, or manual intervention matter most.

Key Features

The main capabilities of ShopNow Voice Agent are:

  • Real-time voice conversation over WebSocket
  • Speech-to-text processing for customer audio
  • Text-to-speech generation for AI responses
  • Intent classification across five support categories:
    • order status
    • return and refund
    • payment issue
    • delivery complaint
    • product query
  • Entity extraction for operational details such as order IDs and complaint types
  • Sentiment analysis to detect positive, neutral, negative, and angry customer turns
  • Session memory for multi-turn support conversations
  • Database-backed responses using real order and payment context
  • Retrieval-augmented generation over internal policy documents
  • Human escalation logic based on user request and emotional signals
  • Structured escalation brief generation for human support teams
  • Call transcript logging and AI-generated call summaries
  • Dashboard and reporting views for support operations

Technologies Used

The project is built on a practical Python-based AI application stack.

  • FastAPI powers the backend API and real-time WebSocket services.
  • Streamlit powers the dashboard, escalation view, reporting page, and testing interface.
  • OpenAI powers intent detection, sentiment analysis, response generation, embeddings, and call summarization.
  • Sarvam AI powers speech-to-text and text-to-speech for the voice layer.
  • LangChain and FAISS power the retrieval pipeline over policy and FAQ documents.
  • SQLite and SQLAlchemy store and manage order records, call logs, escalation data, and reporting data.
  • Pandas and Plotly are used for analytics and visualization in the operations dashboard.
  • Loguru is used for logging and traceability.

The repository does not show explicit Airia-native SDK usage or Airia platform-specific runtime features. The solution is currently implemented as a custom AI voice agent stack built from FastAPI, Streamlit, OpenAI, Sarvam AI, FAISS, and SQLite.

Target Users

Agent Priya Voice Agent is designed to benefit several user groups:

  • E-commerce customers, who want quick and convenient help for routine support issues
  • Customer support agents, who need relief from repetitive Tier-1 call volume
  • Support managers and operations teams, who need visibility into call outcomes, sentiment, escalations, and workload patterns
  • E-commerce and D2C businesses, especially those serving multilingual customers and high order volumes across India

Customers benefit from faster, more available support. Human agents benefit from receiving fewer repetitive queries and better context during escalations. Businesses benefit from improved scalability, efficiency, and consistency in customer support operations.

How we built it

We built ShopNow Voice Agent as a modular end-to-end support system rather than a single prompt or chatbot wrapper. The design combines voice interaction, language intelligence, structured business data, retrieval, escalation logic, and analytics.

The backend is implemented in FastAPI. It initializes the app, sets up routes, loads the retrieval index, and exposes APIs for starting calls, handling turns, logging sessions, generating reports, and supporting a real-time voice conversation over WebSocket. This gave us a flexible way to support both standard API flows and low-latency conversational sessions.

For the real-time voice path, we used a WebSocket route that accepts streamed audio from the client, buffers it, processes it as customer turns, and returns spoken responses. This allowed the interaction to feel like a live support call instead of a stop-and-start text workflow.

We integrated Sarvam AI for the speech layer. Sarvam handles:

  • speech-to-text for customer utterances
  • text-to-speech for the agent’s spoken responses

This is especially relevant for an India-focused customer support use case because multilingual voice handling is a core part of the user experience.

For intelligence and orchestration, we used OpenAI across several parts of the workflow:

  • intent classification
  • entity extraction
  • sentiment analysis
  • response generation
  • transcript summarization
  • embedding generation

The intent layer is built around business-relevant support categories instead of open-ended labels. That allows the system to route requests into handlers that retrieve the right operational context. For example, if the user asks about a refund, the system can look up refund-related details from the order record instead of generating a generic answer.

To reduce hallucination risk and improve consistency, we added a retrieval-augmented generation layer using LangChain and FAISS. We stored policy and FAQ documents in the repository and indexed them so the system can pull in relevant knowledge for questions about:

  • cancellation rules
  • shipping policies
  • return windows
  • refund process
  • payment issues
  • product authenticity and warranty

We also created a database-backed order context layer using SQLite and SQLAlchemy. A seeded dataset provides realistic order, payment, seller, and refund status information. Dedicated handlers fetch structured details for:

  • order status
  • return and refund
  • payment issue
  • delivery complaint
  • product query

That data is passed into the response generation flow so the model can answer from actual support context instead of guesswork.

For state management, we built an in-memory session system that tracks:

  • call ID
  • phone number
  • language
  • current intent
  • conversation history
  • sentiment history
  • order context
  • escalation status

This supports multi-turn conversations and keeps the interaction coherent across turns.

On top of that, we added an escalation module that watches for emotional and operational signals. When the customer asks for a real person or becomes sufficiently frustrated, the system generates a human handoff brief with issue context, tone recommendation, and conversation history. That bridges the AI workflow with human support operations.

Finally, we built an operator-facing frontend using Streamlit. It includes:

  • a live dashboard
  • an escalation view
  • a daily report page
  • a test interface for interacting with the agent

This helped turn the project into a usable support operations prototype rather than only a backend demonstration.

Challenges we ran into

One major challenge was building a system that feels like a support agent rather than a language model demo. Customer support requires more than fluent answers. It demands factual grounding, operational accuracy, short and clear responses, good turn-taking, and safe escalation behavior. That made the project more complex than simply calling an API and displaying text.

Another challenge was coordinating the real-time voice loop. Voice interfaces introduce different constraints than chat. Audio must be buffered correctly, the system must know when a user turn is complete, and the response has to return fast enough to feel conversational. Designing that live loop while keeping the rest of the reasoning pipeline intact was a meaningful challenge.

Multilingual handling was also difficult. In the intended use case, customers may switch between English, Hindi, and Hinglish. Even structured information such as order IDs may be spoken in inconsistent ways. Designing the intent classification and entity extraction so that the system could normalize and use those details correctly was a challenge.

We also had to solve the problem of grounding. Customer support answers cannot rely only on LLM fluency. They need facts. That pushed us to combine three different context sources:

  • current conversation history
  • structured order data from the database
  • relevant support policy documents via retrieval

Balancing these inputs into a clean response pipeline was an important engineering challenge.

Escalation behavior was another challenge. We did not want the agent to sound capable on easy questions but become harmful on sensitive ones. The system needed a reliable way to identify when the right move was not another AI reply, but a handoff to a human agent. That required us to think carefully about sentiment, frustration, explicit human requests, and how to summarize a live issue for the next support person.

Finally, one practical challenge was building both the customer-facing and operations-facing sides of the system. It is one thing to build a conversation flow. It is another to include reporting, logging, dashboards, and escalation lookup so the solution can also support the team running it.

Accomplishments that we're proud of

We are proud that ShopNow Voice Agent feels like an actual support workflow rather than a standalone chatbot. It includes live voice interaction, structured intent routing, factual business context, policy retrieval, sentiment-aware escalation, transcript logging, summarization, and analytics. That combination gives the project real operational shape.

We are especially proud of how the project grounds its responses. The agent does not rely on generic language generation alone. It pulls in:

  • real order data from the database
  • policy data from retrieval documents
  • customer conversation history from the current session

That makes the system far more useful for customer support than a generic conversational assistant.

Another accomplishment is the escalation design. The project does not treat automation as an all-or-nothing game. It accepts that some support interactions should move to a human, and it prepares a structured handoff brief to make that transition smoother and more informed. That makes the workflow more practical and customer-friendly.

We are also proud of the operations layer. The dashboard, recent call reporting, intent distribution, language breakdown, and escalation lookup features show that we built the system with support teams in mind, not just demo viewers.

From a product perspective, we are proud that the project aligns tightly with the challenge described in the brief. It directly addresses long wait times, repetitive support workload, inconsistent support quality, and limited service availability. That makes the solution feel grounded in a real business problem with clear measurable value.

What we learned

We learned that the most useful AI systems in customer support are not the ones that try to sound the smartest. They are the ones that stay grounded, stay concise, and know when to involve a human. Building ShopNow reinforced how important it is to design AI as part of a larger support workflow rather than as a standalone response engine.

One major lesson was the value of structured context. A language model becomes much more reliable when it is paired with actual business data and policy knowledge. In this project, the combination of order handlers and RAG was essential in making the responses operationally useful.

We also learned that voice changes the design discipline. Because the interaction is spoken, the system must respond with shorter, clearer, and more direct language than a text chatbot might use. That pushed us to think more carefully about pacing, tone, and turn length.

Another important lesson was that sentiment awareness is a real product feature, not just an extra ML component. In customer support, emotional state affects whether the conversation should continue, change tone, or escalate. Handling that well is part of building trust.

We learned that multilingual customer support is not solved by just saying a model supports multiple languages. Real support conversations contain mixed phrasing, inconsistent naming, and varying ways of expressing the same issue. That means the surrounding system design matters as much as the model itself.

We also learned that support products serve two audiences at the same time:

  • the customer who wants their issue resolved quickly
  • the support organization that needs visibility, logs, and control

That is why building the reporting and escalation layers felt just as important as building the voice conversation itself.

What's next for ShopNow

The next step for ShopNow is to move from a strong MVP toward a more production-ready support platform.

One major direction is deeper multilingual support. The system is already designed with an India-focused support context in mind, but future versions could improve language detection, expand language coverage, and strengthen normalization for mixed-language speech and spoken identifiers like order IDs.

Another important next step is stronger persistence and reliability. The current in-memory session approach works well for a prototype, but a production version would benefit from durable session storage, better fault handling, improved retry behavior, and more robust monitoring.

We also want to add richer platform integrations. The agent could become far more useful if it connects directly with:

  • CRM systems
  • ticketing and helpdesk platforms
  • order management systems
  • payment systems
  • messaging channels such as WhatsApp

That would allow the system to not only answer questions, but also trigger real support actions such as raising a case, initiating a return flow, or syncing a customer update to a support agent.

The escalation workflow can also be improved. Future versions could:

  • route customers to specialized support queues
  • assign escalations by language or issue type
  • include richer summaries and action recommendations
  • track escalation outcome quality

On the reporting side, we would like to add:

  • time-based support trends
  • unresolved issue clustering
  • SLA tracking
  • post-call satisfaction capture
  • more detailed operational performance metrics

The long-term vision for ShopNow is an AI-first support layer for e-commerce that is always available, multilingual by design, grounded in business data, and tightly integrated with the human support team. Rather than replacing human support, it would make human support more scalable, more informed, and more effective.

Built With

Share this project:

Updates