Inspiration

I wanted to fix the friction in e-commerce support. Traditional IVR menus and slow live chats are frustrating, so I built an interface that actually listens. The goal was for XAYA to feel like a high-bandwidth conversation that just happens to be with an AI.

What it does

XAYA handles real-time voice requests for the AYA Store. It manages order tracking, shipping address updates, and return requests. It’s not just a demo—it executes functional tool calls against a DynamoDB backend to update order states and create support tickets instantly via voice commands.

How I built it

The core is a dual-model architecture on Amazon Bedrock:

Nova 2 Sonic handles the low-latency, bidirectional WebSocket stream for real-time speech-to-speech. Nova 2 Lite acts as a logic guardrail, analyzing conversation history to authorize state-changing operations (like cancellations) before execution. The backend is FastAPI, and the frontend is React using a custom Web Audio API pipeline to stream raw 16-bit PCM audio chunks for near-zero latency. Challenges I ran into Managing barge-in (automatic fallback when a user interrupts the AI) was the most difficult part of the WebSocket orchestration. I also had to implement a custom LPCM player and a script processor to handle raw byte streaming, as standard browser audio elements can't handle live PCM buffers without significant lag.

Accomplishments that I'm proud of

The system achieves sub-second latency, making the voice interaction feel truly reactive. I’m also proud of the Reasoning Log—it exposes the AI's internal decision-making and guardrail logic to the user in real-time, providing transparency for every automated action.

What I learned

I gained deep experience with Voice Activity Detection (VAD) and the intricacies of bidirectional streaming. I also learned that using specialized models for specific tasks—like Sonic for speed and Lite for reasoning—is significantly more performant than relying on a single general-purpose LLM.

What's next for XAYA

I’m planning to move the architecture to a mobile-native environment for better microphone access and implement proactive event-driven notifications, where XAYA can initiate a voice session if a shipment status changes in DynamoDB.

Built With

Share this project:

Updates