Inspiration
I wanted to fix the friction in e-commerce support. Traditional IVR menus and slow live chats are frustrating, so I built an interface that actually listens. The goal was for XAYA to feel like a high-bandwidth conversation that just happens to be with an AI.
What it does
XAYA handles real-time voice requests for the AYA Store. It manages order tracking, shipping address updates, and return requests. It’s not just a demo—it executes functional tool calls against a DynamoDB backend to update order states and create support tickets instantly via voice commands.
How I built it
The core is a dual-model architecture on Amazon Bedrock:
Nova 2 Sonic handles the low-latency, bidirectional WebSocket stream for real-time speech-to-speech. Nova 2 Lite acts as a logic guardrail, analyzing conversation history to authorize state-changing operations (like cancellations) before execution. The backend is FastAPI, and the frontend is React using a custom Web Audio API pipeline to stream raw 16-bit PCM audio chunks for near-zero latency. Challenges I ran into Managing barge-in (automatic fallback when a user interrupts the AI) was the most difficult part of the WebSocket orchestration. I also had to implement a custom LPCM player and a script processor to handle raw byte streaming, as standard browser audio elements can't handle live PCM buffers without significant lag.
Accomplishments that I'm proud of
The system achieves sub-second latency, making the voice interaction feel truly reactive. I’m also proud of the Reasoning Log—it exposes the AI's internal decision-making and guardrail logic to the user in real-time, providing transparency for every automated action.
What I learned
I gained deep experience with Voice Activity Detection (VAD) and the intricacies of bidirectional streaming. I also learned that using specialized models for specific tasks—like Sonic for speed and Lite for reasoning—is significantly more performant than relying on a single general-purpose LLM.
What's next for XAYA
I’m planning to move the architecture to a mobile-native environment for better microphone access and implement proactive event-driven notifications, where XAYA can initiate a voice session if a shipment status changes in DynamoDB.
Built With
- amazon-web-services
- bedrock
- docker
- fastapi
- javascript
- langchain
- nova
- python
- react
- vite
Log in or sign up for Devpost to join the conversation.