Inspiration
Every restaurant loses revenue when no one picks up the phone during a dinner rush.
Staff are juggling tables, the kitchen is slammed, and callers either get put on hold or hang up.
We wanted to build something that just picks up — every call, every time, without adding headcount.
## What it does
VoiceTable is an AI phone agent for restaurants. Call the number and it instantly handles:
- 🛍️ Pickup orders — full menu, sizes, sauces, special requests
- 📅 Table reservations — books and cancels via Square in real time
- ❓ Menu questions — prices, ingredients, allergens, hours
- 🌍 Any language — auto-detects and mirrors the caller's language (English, Mandarin, Cantonese, Spanish, and more)
No app. No typing. Just a phone call.
## How we built it
| Layer | Technology | |---|---| | AI (STT + LLM + TTS) | Gemini 2.5 Flash Native Audio — Live API | | Phone | Twilio Media Streams | | POS / Bookings | Square Bookings API + Orders API | | Backend | Python, FastAPI | | Cloud | Google Cloud Run |
Key engineering decisions:
- A single Gemini Live bidirectional audio stream replaces the entire STT → LLM → TTS pipeline
- Manual VAD built with
audioop.rms()— Gemini's built-in VAD silently fails on 8kHz phone audio, so we implemented our own with explicitActivityStart/ActivityEndsignals - Automatic session reconnection — when Gemini drops the connection mid-call, the agent recovers and continues without going silent
## Challenges we ran into
Phone audio quality Twilio sends mulaw 8kHz; Gemini expects PCM 16kHz. Resampling had to be stateful across every audio chunk or the speech would corrupt.
Gemini VAD on phone audio The built-in voice activity detection silently failed — the agent would never respond. We had to implement our own VAD from scratch.
Session drops mid-call
Gemini Live sessions die with 1008 policy violation after long tool-call sequences. Without reconnection logic, the
call goes permanently silent after placing an order.
Twilio caller ID
{{From}} template syntax only works in Twilio Bins — not in webhook responses. The caller's phone number wasn't
being passed at all until we traced through the logs.
## Accomplishments that we're proud of
- ✅ A real phone number you can call right now
- ✅ Real Square bookings and orders created through natural conversation
- ✅ Automatic multilingual support — zero configuration, just works
- ✅ Seamless session recovery — the call survives even if the Gemini connection drops
## What we learned
- Gemini Live Native Audio is powerful enough to replace an entire STT + LLM + TTS pipeline in a single API call
- Phone telephony (8kHz, mulaw, packet loss) requires real engineering on top of any cloud AI API
- Production resilience matters as much as the AI — tool latency and session timeouts will silently break your agent in the real world
## What's next for VoiceTable
- 🏪 Multi-restaurant support — one platform, many restaurants
- 📲 Outbound SMS reminders before reservations
- ⏱️ Wait time estimates for pickup orders
- 💬 Upselling during order flow
- 📊 Analytics dashboard — call volume, popular items, missed calls
Try it out : Call 669-201-5051 Demo 1 in EN (booking table) : https://www.youtube.com/shorts/BQhC79C_ZXg Demo 2 in CN (order pick up order): https://www.youtube.com/shorts/sUjl1b6btZQ
Log in or sign up for Devpost to join the conversation.