Inspiration
Drive-thrus are broken. Long lines, mishearing orders, stressed cashiers, and customers who just want to say "Big Mac, no pickles" and move on. We wanted to see if a conversational AI agent could replicate and improve the drive-thru experience entirely through voice.
## What It Does
VoiceThru AI is a voice-first ordering system for McDonald's. Customers speak naturally to an AI agent, which understands their order, customizes items (add cheese, remove pickles, upsize the drink), handles back-and-forth conversation, and places the order — all without touching a screen.
- Speak naturally: "I'll have a Big Mac, no onions, with a large Coke"
- Real customizations: toppings, sauces, sizes all handled by voice
- Bilingual: agent auto-detects French and switches the entire UI to French in real time
- Live order dashboard: staff see incoming orders instantly with status tracking
- Flying food animation + confetti: because demo wow factor matters
## How We Built It
- Next.js 16 (App Router) for the full-stack framework
- ElevenLabs Realtime Agents SDK (
@elevenlabs/react) over WebRTC for the voice agent - 6 custom client-side tools registered in ElevenLabs: display products, show customizations, apply cart changes, get cart, place order, and switch language
- Firebase Firestore for real-time order sync between customer and kitchen dashboard
- Auth0 to protect the staff dashboard
- Tailwind CSS v4 + shadcn/ui for the UI
The agent is given the full menu and store context as dynamic variables at session start.
All cart mutations happen through client tools the agent calls get_cart before any
update to ensure it has accurate line_item_ids, preventing stale state bugs.
## Challenges
- WebRTC latency & tool call timing: Getting cart tool calls to fire in the right order (get → update) without race conditions took careful prompt engineering.
- Bilingual voice UX: Switching language mid-conversation while keeping the agent
coherent required both prompt design and a custom
switch_languageclient tool. - Realistic customization logic: Burgers have toppings/sauces, drinks have sizes, sides have neither modelling this cleanly in a static JS menu file took iteration.
- Demo polish under time pressure: Animations (flying food, confetti, audio-reactive mic rings) had to feel smooth while the core logic was still being built.
## What We Learned
- ElevenLabs' client tools are powerful you can drive complex UI state entirely from agent responses with very low latency.
- Prompt engineering for ordering agents is surprisingly nuanced: bulk order limits, confirmation flows, and graceful handling of ambiguous requests all needed explicit rules.
- Real-time Firestore + voice = a genuinely satisfying end-to-end demo loop.
Log in or sign up for Devpost to join the conversation.