** Devpost Submission **
** Inspiration **
I built a voice AI receptionist for an Australian client back in March 2025. It was ElevenLabs-only, expensive to run, and ultimately failed commercially — the per-minute cost made it unviable for small businesses. But the calls I watched come through were magic. Callers genuinely couldn't tell it was AI. The problem wasn't the technology. The problem was that only high-budget businesses could afford it.
That stuck with me. Every plumber, every salon, every takeaway in the UK misses calls daily. They're paying for Google Ads and then not answering the phone. The leads just walk to the competitor. I knew the solution existed — I'd already built it — but it needed to work at a price point that a one-person business could justify.
So I rebuilt it from scratch. Different architecture, different stack, different pricing model. Two products instead of one. Make premium voice AI available to those who want it, and make a solid, affordable alternative for everyone else.
** What it does **
It's an AI voice receptionist SaaS platform with two distinct products:
Standard (£17/month) — Powered by Google Gemini. Answers every call, has natural conversations, takes messages, creates tickets, analyses sentiment, recognises returning callers by phone number, and gives the business owner a full dashboard. No calendar, no booking — just reliable, affordable call handling.
Prime (from £79/month) — Powered by ElevenLabs. Ultra-natural voice that callers genuinely cannot distinguish from a human. Everything in Standard, plus a built-in knowledge base, calendar integration, and appointment booking. Three tiers: Prime, Prime Pro, and Prime Enterprise based on volume.
The onboarding itself is AI-powered. Instead of filling in web forms, the business owner gets a phone call from the AI that interviews them about their business — greeting preferences, services, hours, escalation rules, restricted information. The AI captures everything and configures itself.
When a call comes in, the AI greets the caller (personalised by time of day), has a real conversation, and if the same number calls again, it recognises them by name and references their previous calls. After every call, the system runs post-call analysis: summary, sentiment, category, intent, and auto-creates a support ticket if needed.
*How we built it *
The entire platform is built as a production Flask application deployed on Koyeb with PostgreSQL.
For Standard tier: Twilio handles telephony (UK numbers, call routing, webhooks), and Google Gemini provides real-time conversational AI. Each call goes through a multi-turn gather-respond loop where Twilio captures speech, sends it to Gemini with full conversation history and business context, and Gemini's response is spoken back via Twilio's TTS.
For Prime tier: ElevenLabs Conversational AI handles the voice interaction directly through their built-in Twilio integration. Call data is synced back into our platform via the ElevenLabs API, so both tiers share the same dashboard, analytics, and ticket system.
Caller recognition works by matching the incoming phone number against previous call records. If a match is found, the caller's name and full history are injected into the AI prompt, so the receptionist greets them by name and has context on their previous interactions.
Post-call analysis uses Gemini to process the conversation transcript and extract: summary, sentiment, category, caller intent, resolution status, caller name, and whether a ticket should be created. This runs automatically when the call ends.
The admin panel supports multi-client management with per-number AI personality, custom greetings, call routing rules, and usage tracking with minute limits per tier.
** Challenges we ran into**
The biggest challenge was making conversation state survive across multiple server workers. With gunicorn running multiple processes, in-memory caching meant Worker A would handle turn 1 and Worker B would get turn 2 with zero context. The AI would re-greet the caller every single turn. The fix was moving all conversation state to PostgreSQL — every turn reads the full history from the database and writes the updated version back.
Goodbye detection was another headache. The AI needs to know when the caller is done so it can run post-call analysis. But Twilio's status callback wasn't firing reliably for manually-configured numbers. And our first attempt at keyword-based goodbye detection was too aggressive — "thanks" mid-conversation would kill the call. We ended up with a layered approach: strict goodbye detection (only triggers after turn 2, only if "bye" is actually in the text), silence detection (3 consecutive no-speech timeouts), a TwiML fallback redirect, and an admin cleanup route for anything that slips through.
Call recording via Twilio's REST API failed on inbound calls during the initial webhook — the call isn't in the right state yet. We switched to transcript-based analysis instead, which actually works better because the conversation log is richer than a recording transcript.
** Accomplishments that we're proud of**
The AI-powered voice onboarding. No other competitor does this. Every other service uses web forms. Ours calls the business owner and has a conversation. It feels like the future.
Caller recognition that actually works. Call from the same number twice, and the AI greets you by name and knows what you called about last time. It's a small thing but it makes an enormous difference to how professional the business looks to their customers.
The two-tier pricing model. We're not forcing small businesses to pay £79+ for features they don't need. A plumber who just wants their calls answered while they're on a job can pay £17. A dental practice that needs appointment booking pays £79. Both get a genuine AI receptionist, not a voicemail system.
The entire platform is production-ready. This isn't a prototype or a demo. It has multi-client admin, usage tracking, billing tiers, per-number configuration, ticket management, and a proper client dashboard with analytics.
What we learned
Premium voice AI is still too expensive for mass market. ElevenLabs sounds incredible but at 6-8p per minute, a busy business could burn through £200 in a month just on AI costs. Gemini's voice capabilities are good enough for 80% of use cases at a fraction of the cost. The two-tier model isn't a compromise — it's the right architecture for the market.
Database-first state management is non-negotiable when running multiple server workers. We learned this the hard way when conversations kept resetting mid-call.
Goodbye detection in voice AI is a surprisingly hard problem. People say "thanks" and "cheers" all the time without meaning the conversation is over.
What's next for AI Voice Receptionist SaaS
Scheduled callbacks — when the AI can't resolve something, it books a callback and the business owner gets a briefing before they call back.
Outbound follow-up calls — AI calls the customer back with updates on their enquiry.
Integration with booking platforms like Calendly, Google Calendar, and NHS booking systems.
A mobile app for business owners to see calls, listen to summaries, and manage settings on the go.
Expansion into healthcare — we've been accepted into the Propel HealthTech West Yorkshire accelerator programme and are in discussions with NHS Digital about using the platform for patient-facing call handling.
Log in or sign up for Devpost to join the conversation.