Yapper

Yapper — voice AI that manages your email, Slack, and WhatsApp so you never have to stare at a screen again.

Inspiration

Dashboards are dead. Every productivity tool gives you another screen to stare at, another inbox to check, another tab to keep open. The average professional gets 100+ messages a day across email, Slack, and WhatsApp. They don't need a prettier dashboard. They need to not look at a screen at all.

Voice is the next interface. Not voice-as-a-gimmick. Voice as the primary way you interact with your work. Talk to your inbox while driving. Approve a message while making coffee. Reply to a Slack thread without touching a keyboard.

That's Yapper. No dashboard-first thinking. Voice-first everything. We built a demo around an insurance broker persona to show what this looks like in a high-volume professional context.

What it does

Yapper silently ingests messages from Gmail (IMAP), Slack (Socket Mode), and WhatsApp — then uses AI to categorize each one by urgency. Urgent items trigger a WhatsApp alert on your phone.

But the real product is the voice:

"What's urgent?" — Sarah summarizes what needs attention
"Draft a reply to Gillian approving the 12% rate" — she writes it and reads it back
"Send it" — the email goes out, signed and logged
"Any Slack messages?" — she finds them, drafts a reply, sends it to the right thread

No screens. No typing. No context switching between five apps. One voice. Every channel.

The dashboard exists — but it's the secondary interface. A monitoring tool, not the product. The product is the conversation.

How we built it

Everything runs through NATS pub/sub — and that's the core architectural bet.

Every communication channel is just an adapter that publishes to a NATS subject: messages.inbound.email, messages.inbound.slack, messages.inbound.whatsapp. The categorizer subscribes to messages.inbound.* — one wildcard catches everything. Adding a new channel (Teams, Discord, SMS, whatever) is just writing a new adapter that publishes to messages.inbound.<channel>. No orchestrator. No central router. No rewiring. Just publish and the system picks it up.

Same on the outbound side. The voice agent approves a message → publishes to messages.approved → the outbound agent picks it up and routes to the right channel. Adding a new outbound channel is just a new else if block.

This is why we didn't need a heavyweight integration framework. NATS gives us the decoupling. Each adapter is a standalone process — crash one and everything else keeps running.

The rest of the stack:

BAML with Groq → Gemini fallback for structured triage (category, score, gist, draft reply)
ElevenLabs Conversational AI for real-time voice with barge-in — interrupt Sarah mid-sentence
MongoDB for persistence with unique indexes to prevent duplicate processing
Ed25519 signatures on every categorization and send for cryptographic audit trail
Next.js + shadcn/ui — Apple-inspired Siri orb. The dashboard is secondary by design.

The voice-to-send pipeline: speech → ElevenLabs transcription → LLM drafts reply → verbal confirmation → frontend detects it → /api/approve → NATS messages.approved → outbound agent sends email/Slack. All real-time. All through the message bus.

Challenges we ran into

Voice context goes stale. ElevenLabs WebSocket gets inbox data once at session start — no API to refresh mid-conversation. New messages were invisible to the agent. Fixed by filtering replied messages from context and adding a refresh indicator.

The agent lied about sending. Sarah would say "I've sent the response" but nothing actually sent — the frontend only matched one specific phrase. Broadened detection and extracted the actual draft from the confirmation message itself.

Wrong message, wrong thread. The send function matched recipients by name, but identical names matched multiple messages. Combined with missing channel IDs, replies went to wrong threads. Fixed with proper Slack channel ID routing and DM fallback.

Making voice reliable is 10x harder than making it work once. The gap between a demo that works and a demo that works every time is where 80% of our effort went.

Accomplishments

Voice-to-send works end-to-end: speak → draft → confirm → email/Slack reply sent
Barge-in interruption — talk over Sarah and she stops immediately
NATS-first architecture — adding a new channel is just a new adapter, zero rewiring
Messages marked as replied don't resurface — context stays fresh
Security hardened: webhook HMAC verification, bearer auth, XSS prevention, rate limiting

What we learned

Voice is ready to be the primary interface — not a novelty feature bolted onto a dashboard. When the voice works well, you genuinely don't want to go back to typing.
NATS is the right abstraction for multi-channel systems. Wildcard subscriptions mean zero coupling between adapters. We added Slack support in under an hour because the categorizer didn't need to change — it was already listening to messages.inbound.*.
BAML makes structured LLM output reliable across fallback chains
ElevenLabs Conversational AI is powerful but opinionated — architect around its constraints