Inspiration

Customer support is broken. Live chat feels robotic, knowledge bases go unread, and hiring 24/7 agents is expensive. We asked: what if visitors could just talk to your website?

Voice is the most natural interface humans have — yet no embeddable support tool uses it as the primary channel. ELV was born to change that.

## What it does

ELV is a voice-first embeddable support widget — think Intercom, but powered by conversation instead of text bubbles.

  • A business adds a single <script> tag to their site
  • Visitors click the widget and speak their question out loud
  • An AI voice agent answers in real-time, using the website's own content as its knowledge base (RAG)
  • Business owners manage everything — crawled pages, agent behavior, analytics — through a dashboard

## How we built it

ELV is a full-stack monorepo built with Turborepo + pnpm:

| Layer | Stack |
|-------|-------|
| Widget | Preact + TypeScript — lightweight loader.js + iframe SPA | | Voice Agent | Python + LiveKit Agents SDK — real-time WebRTC voice |
| STT / TTS | Deepgram Nova-3 (speech-to-text) + Cartesia Sonic-3 (text-to-speech) |
| LLM + RAG | GPT-4o-mini + OpenAI embeddings + pgvector for retrieval |
| API | FastAPI + Alembic migrations + PostgreSQL 16 |
| Ingestion | Playwright crawler + Redis Queue workers for embedding generation |
| Dashboard | Next.js 14 (App Router) + Clerk auth + Tailwind + shadcn/ui |

The architecture separates concerns cleanly: the widget handles UI, the API handles auth and data, and the agent service handles real-time voice + RAG independently.

## Challenges we faced

  • Real-time voice latency — Keeping the round-trip (speech → STT → LLM → TTS → audio) under 1.5s required careful pipeline optimization and streaming at every stage.
  • RAG quality — Chunking web pages for retrieval is deceptively hard. Too large and context gets diluted; too small and answers lack coherence. We iterated heavily on chunk sizing and overlap.
  • Widget embedding — Making a widget that works on any website without style conflicts meant strict iframe isolation and a postMessage bridge for communication.
  • Crawler reliability — Real-world websites are messy. SPAs, auth walls, infinite scrolls — the ingestion pipeline needed robust error handling and retry logic.

## What we learned

  • Voice UX is fundamentally different from chat UX — silence feels broken, so we added filler audio and visual feedback to keep the experience feeling alive.
  • pgvector with proper indexing handles RAG retrieval surprisingly well at our scale.
  • Preact's 3KB footprint was the right call for an embeddable widget — every kilobyte matters when you're loading on someone else's site.

## What's next for ELV

  • Multi-language support (agent speaks the visitor's language)
  • Conversation analytics and intent clustering in the dashboard
  • Custom voice cloning so businesses can have a branded voice
  • Shopify / WordPress one-click install plugins

Built With

Share this project:

Updates