Inspiration
Customer support is broken. Live chat feels robotic, knowledge bases go unread, and hiring 24/7 agents is expensive. We asked: what if visitors could just talk to your website?
Voice is the most natural interface humans have — yet no embeddable support tool uses it as the primary channel. ELV was born to change that.
## What it does
ELV is a voice-first embeddable support widget — think Intercom, but powered by conversation instead of text bubbles.
- A business adds a single
<script>tag to their site - Visitors click the widget and speak their question out loud
- An AI voice agent answers in real-time, using the website's own content as its knowledge base (RAG)
- Business owners manage everything — crawled pages, agent behavior, analytics — through a dashboard
## How we built it
ELV is a full-stack monorepo built with Turborepo + pnpm:
| Layer | Stack |
|-------|-------|
| Widget | Preact + TypeScript — lightweight loader.js + iframe SPA |
| Voice Agent | Python + LiveKit Agents SDK — real-time WebRTC voice |
| STT / TTS | Deepgram Nova-3 (speech-to-text) + Cartesia Sonic-3 (text-to-speech) |
| LLM + RAG | GPT-4o-mini + OpenAI embeddings + pgvector for retrieval |
| API | FastAPI + Alembic migrations + PostgreSQL 16 |
| Ingestion | Playwright crawler + Redis Queue workers for embedding generation |
| Dashboard | Next.js 14 (App Router) + Clerk auth + Tailwind + shadcn/ui |
The architecture separates concerns cleanly: the widget handles UI, the API handles auth and data, and the agent service handles real-time voice + RAG independently.
## Challenges we faced
- Real-time voice latency — Keeping the round-trip (speech → STT → LLM → TTS → audio) under 1.5s required careful pipeline optimization and streaming at every stage.
- RAG quality — Chunking web pages for retrieval is deceptively hard. Too large and context gets diluted; too small and answers lack coherence. We iterated heavily on chunk sizing and overlap.
- Widget embedding — Making a widget that works on any website without style conflicts meant strict iframe isolation and a postMessage bridge for communication.
- Crawler reliability — Real-world websites are messy. SPAs, auth walls, infinite scrolls — the ingestion pipeline needed robust error handling and retry logic.
## What we learned
- Voice UX is fundamentally different from chat UX — silence feels broken, so we added filler audio and visual feedback to keep the experience feeling alive.
- pgvector with proper indexing handles RAG retrieval surprisingly well at our scale.
- Preact's 3KB footprint was the right call for an embeddable widget — every kilobyte matters when you're loading on someone else's site.
## What's next for ELV
- Multi-language support (agent speaks the visitor's language)
- Conversation analytics and intent clustering in the dashboard
- Custom voice cloning so businesses can have a branded voice
- Shopify / WordPress one-click install plugins
Built With
- cartesia
- clerk
- cloudflare
- css
- deepgram
- fastapi
- livekit
- next.js
- openai
- pages
- pgvector
- playwright
- postgresql
- preact
- python
- redis
- shadcn/ui
- tailwind
- turborepo
- typescript
- vercel
Log in or sign up for Devpost to join the conversation.