Inspiration

Drive-thrus are broken. Long lines, mishearing orders, stressed cashiers, and customers who just want to say "Big Mac, no pickles" and move on. We wanted to see if a conversational AI agent could replicate and improve the drive-thru experience entirely through voice.

## What It Does

VoiceThru AI is a voice-first ordering system for McDonald's. Customers speak naturally to an AI agent, which understands their order, customizes items (add cheese, remove pickles, upsize the drink), handles back-and-forth conversation, and places the order — all without touching a screen.

  • Speak naturally: "I'll have a Big Mac, no onions, with a large Coke"
  • Real customizations: toppings, sauces, sizes all handled by voice
  • Bilingual: agent auto-detects French and switches the entire UI to French in real time
  • Live order dashboard: staff see incoming orders instantly with status tracking
  • Flying food animation + confetti: because demo wow factor matters

## How We Built It

  • Next.js 16 (App Router) for the full-stack framework
  • ElevenLabs Realtime Agents SDK (@elevenlabs/react) over WebRTC for the voice agent
  • 6 custom client-side tools registered in ElevenLabs: display products, show customizations, apply cart changes, get cart, place order, and switch language
  • Firebase Firestore for real-time order sync between customer and kitchen dashboard
  • Auth0 to protect the staff dashboard
  • Tailwind CSS v4 + shadcn/ui for the UI

The agent is given the full menu and store context as dynamic variables at session start. All cart mutations happen through client tools the agent calls get_cart before any update to ensure it has accurate line_item_ids, preventing stale state bugs.

## Challenges

  • WebRTC latency & tool call timing: Getting cart tool calls to fire in the right order (get → update) without race conditions took careful prompt engineering.
  • Bilingual voice UX: Switching language mid-conversation while keeping the agent coherent required both prompt design and a custom switch_language client tool.
  • Realistic customization logic: Burgers have toppings/sauces, drinks have sizes, sides have neither modelling this cleanly in a static JS menu file took iteration.
  • Demo polish under time pressure: Animations (flying food, confetti, audio-reactive mic rings) had to feel smooth while the core logic was still being built.

## What We Learned

  • ElevenLabs' client tools are powerful you can drive complex UI state entirely from agent responses with very low latency.
  • Prompt engineering for ordering agents is surprisingly nuanced: bulk order limits, confirmation flows, and graceful handling of ambiguous requests all needed explicit rules.
  • Real-time Firestore + voice = a genuinely satisfying end-to-end demo loop.

Built With

Share this project:

Updates