Inspiration
We asked a simple question: what if you never had to pick up the phone again?
Not because you're busy. Not because you're screening calls. But because something else — something that sounds exactly like you — already answered. Already checked your calendar. Already sent the email. Already said "let me pull that up" and actually pulled it up.
The uncomfortable truth is that most phone calls don't need you. They need your voice, your context, your files, your schedule. Present! gives them exactly that — without you ever being there.
This is what happens when you let AI stop pretending to be an assistant and start pretending to be you.
What it does
Present! is an AI agent that answers your phone calls in your cloned voice. It doesn't just take messages — it has full conversations, looks up files on your computer, reads documents, checks your calendar, creates events, and sends formal emails with attachments. All while you watch the live transcript from a dashboard on your phone and approve every action.
The caller has no idea they're not talking to you.
- Someone calls about a project → the agent finds the document, reads it, and gives them the rundown
- Someone asks to schedule something → it checks your calendar and creates the event
- Someone needs a file emailed → it asks for their address, finds the attachment, writes a professional email, and sends it
You just sit there. Watching. Occasionally pressing "Allow."
How we built it
Voice Pipeline: Twilio receives the call and streams audio over WebSocket. Deepgram transcribes speech in real-time. Gemini 2.5 Flash decides what to say and which tools to invoke. ElevenLabs synthesizes the response in your cloned voice. Audio streams back to the caller — all in under 2 seconds.
Tool Execution: A headed Playwright browser runs on your machine, executing real actions — searching files, reading .docx documents, querying Apple Calendar via AppleScript, creating .ics events, and sending emails through Outlook/Mail.app. The agent has actual access to your computer.
Dashboard: A Next.js app shows the live transcript as the call happens. Every tool call requires manual approval — Claude Code-style Allow/Deny buttons appear inline. Past sessions are stored in Supabase. You can clone your voice directly from the dashboard by recording a sample or uploading an MP3. Deployed on Vercel so you can monitor calls from your phone.
The Stack: Twilio, Deepgram, Gemini 2.5 Flash, ElevenLabs, Playwright, Express, Socket.IO, Next.js 16, React 19, Supabase, Vercel.
Challenges we ran into
Voice cloning quality is unpredictable — short samples produce voices that sound like a different person entirely. The agent sometimes transcribes email addresses literally ("lucas at gmail dot com") instead of recognizing the format. Google Meet blocked our Playwright browser from joining meetings repeatedly (automation detection, permission issues, renderer crashes) — we eventually scoped down to phone calls only. Getting the audio timing right so the greeting isn't cut off required careful delays. The tool approval flow needed to pause the entire Gemini streaming loop mid-execution while waiting for a socket event from the dashboard.
Accomplishments that we're proud of
The moment you call the number and hear your own voice say "Hello?" — that's the moment it clicks. The caller genuinely cannot tell. We built a complete voice-to-action pipeline that doesn't just chat — it operates your computer, sends real emails, and manages your real calendar. The dashboard tool approval system gives you full control while the agent handles the conversation. And the whole thing runs on your laptop.
What we learned
Human interaction is more replaceable than we thought. Most phone calls follow predictable patterns, and an AI with access to your files and calendar can handle them better than you can — because it never forgets, never gets flustered, and never says "let me get back to you" when the answer is right there. The hardest part wasn't the AI. It was making it sound human enough that people stop noticing.
What's next for Present!
Real-time voice streaming (no TTS latency). Multi-call handling — answer several calls simultaneously as different "versions" of you. Persistent memory across calls so the agent remembers previous conversations. Automatic post-call summaries sent to your inbox. And eventually, the part we're not sure we should build: outbound calls. Imagine your AI calling people back for you, in your voice, continuing conversations you never started.
The phone rang. You didn't pick up. Nobody noticed.
Built With
- deepgram
- elevenlabs
- express.js
- gemini
- next.js
- playwright
- react
- socket.io
- supabase
- twilio
- typescript
- vercel
Log in or sign up for Devpost to join the conversation.