Inspiration
We're starting out in our part-time jobs, learning sales techniques and building product knowledge. One of the hardest skills to pick up isn't memorizing a script — it's handling real customers professionally when the situation, tone, and problem all change at once.
We built Voice Training Agent so we could practice that safely: talk to different types of customers, work through realistic problems, and get feedback before we're on a live call. The goal is to grow communication and problem-solving skills the same way you'd drill any other skill — with repetition, variety, and honest review.
What it does
The app gives employees a full practice loop in three parts:
Personas — Users pick or create a customer persona that sets the tone for the voice agent. Personas can be AI-generated (researched from company docs or the web, then reviewed by a critic agent) or created manually. Each persona includes scenario context, emotional patterns, and a win condition so practice feels specific, not generic.
Practice — A live voice session powered by Gemini Live. The AI plays the customer; the employee practices how they'd handle the conversation in real time. Transcripts are saved automatically when the call ends.
Coaching — After each session, a coach agent generates a report on how well you did: what you communicated well, what you missed, metrics tied to the persona rubric, what the customer wanted to hear, and how they were likely feeling throughout the call. The feedback is designed from both the employee and customer perspective — not just "you said X" but "here's how that landed."
Knowledge — An agentic RAG chatbot answers questions about a particular company while employees practice. If someone isn't sure about a return policy, membership tier, or escalation path, they can ask mid-session instead of guessing. Answers are grounded in ingested company documents with cited sources.
Together, this is: learn the company → pick a customer → practice the call → understand what to improve.
How we built it
Architecture
Browser
├─ HTTPS /api/* → Cloud Run: voice-training-api (FastAPI + React SPA)
│ ├─ chat_agent → Mongo hybrid RAG
│ ├─ persona_generator → search / web_search / critic
│ ├─ coach_agent → post-call analysis
│ └─ Mongo → chunks, personas, transcripts, coach_reports
└─ WSS voice → Cloud Run: gemini-proxy → Vertex Gemini Live
The React frontend (apps/voice-training/) and FastAPI backend ship as one Cloud Run service. The SPA builds to web/dist and is served on the same origin as /api/*, so there's no CORS complexity. Voice is separate: browsers can't authenticate to Vertex directly, so a lightweight gemini-proxy WebSocket service adds GCP credentials and bridges audio to Gemini Live.
Tech stack
| Layer | Choices |
|---|---|
| Agents | Google ADK on Vertex AI |
| LLMs | gemini-2.5-flash-lite (chat, search, persona, coach); gemini-live-2.5-flash-native-audio (voice) |
| Embeddings | gemini-embedding-001 (768-dim) |
| Backend | FastAPI + Uvicorn, Python 3.12 |
| Frontend | React 19 + Vite 7 + Framer Motion |
| Database | MongoDB Atlas — vector search, full-text search, document store |
| Deploy | Google Cloud Run + Cloud Build |
Multi-agent design
We spent significant time on agent architecture — aiming for agents that are simple, focused, and efficient rather than one giant prompt doing everything. Six ADK agents power the backend:
| Agent | Role |
|---|---|
| Chat | Knowledge Q&A. Always searches Mongo before answering; returns prose + cited sources. |
| Search | Agentic RAG for persona research — decomposes goals, hybrid retrieval, structured brief. |
| Web search | Fallback when no company docs exist; Gemini Google Search grounding. |
| Persona generator | Orchestrator: research → draft → critic loop → validate → save to Mongo. |
| Persona critic | Quality gate on consistency, grounding, realism, rubric, and ethics. |
| Coach | Post-call analysis: gaps, frustration timeline, rubric self-check, improvements. |
The persona pipeline streams progress to the UI over Server-Sent Events. Sub-agents run in isolated ADK sessions so each keeps its own tools and context.
RAG pipeline
Company knowledge is ingested from markdown support docs (demo datasets for Olive Young and Stripe):
- Structure-aware chunking on markdown headings with breadcrumb context
- Embed with
gemini-embedding-001(separate query/document task types) - Store in MongoDB
chunks, scoped bycompany_id - Retrieve via hybrid search — Atlas Vector Search + full-text search, fused with Reciprocal Rank Fusion
We built scripts/run_chat_eval.py and a 22-question gold eval set to measure retrieval and answer quality under time pressure.
Voice + coaching flow
POST /api/live/sessionreturns a signed token, proxy URL, and voice prompt built from the persona- Browser connects to gemini-proxy → Vertex Live for bidirectional audio + transcription
- Transcript saves to Mongo on call end
- Coach agent loads transcript + persona and generates a structured report
Tooling and workflow
We used Cursor and Claude Code heavily for iteration, plus the MongoDB MCP server to explore schemas and debug queries during integration. Deploy is a single script: ./scripts/deploy/deploy.sh (frontend + backend bundled; optional --with-proxy for voice).
Challenges we ran into
Agent architecture — We spent a lot of time planning and refining the design: which agents own which tools, how the persona pipeline chains sub-agents, and where to draw boundaries so each agent stays simple. Over-engineering early cost us time; the final design — specialized agents with clear handoffs — was worth the iteration.
MongoDB integration — Wiring Atlas Vector Search, full-text indexes, tenant-scoped collections, and the application layer was confusing at first. The MongoDB MCP server helped us inspect data and validate queries, but understanding how search indexes, embeddings, and the Python driver fit together took real debugging.
Agentic RAG quality — Hard to validate retrieval quality in a short hackathon window, especially with limited demo data. Wrong chunks can produce confident-sounding wrong answers. We addressed this with hybrid search, a critic loop for personas, citation requirements in the chat agent, and an eval script — but RAG tuning is still an open problem as we add more companies.
Voice on the web — Browsers can't call Vertex Live directly. We built a signed-token WebSocket proxy, Web Audio worklets for capture/playback, and guardrails (rate limits, prompt caps, SSRF protection on the proxy).
Accomplishments that we're proud of
Shipping something complete and functional — Not just a demo slide or a single agent in isolation, but an end-to-end product: knowledge chat, persona generation, live voice practice, and coaching reports, all deployed and usable.
Seeing the multi-agent design work in practice — Watching the persona pipeline research, draft, get critiqued, revise, and save — then using that persona in a live call and getting a coach report back — felt genuinely rewarding.
Team growth — Several of us were still learning the stack (Google ADK, MongoDB, Cursor, Claude Code) during the hackathon. Everyone contributed meaningfully, and we're proud of how quickly the team leveled up together.
What we learned
Beyond the tools — ADK, MongoDB, AI-assisted development — the bigger lesson was how to build as a team. You can only go so far alone; with clear ownership, async communication, and trust, the ceiling is much higher.
We also learned what it means to think like product engineers: not just "can we build it?" but "does this solve a real problem for someone practicing customer conversations?" That product sense — tying features back to how employees actually learn — shaped what we kept in scope and what we cut.
What's next for Voice Training Agent
Auth and RBAC — Manager vs. employee roles. Managers see coaching reports across their team; employees see their own history and progress.
Company-specific onboarding — Ingest a customer's real support docs, ticket patterns, and policies to refine agentic RAG, persona win conditions, and emotional escalation curves based on actual interactions — not synthetic demo data.
Manager dashboard — Cohort analytics: common weaknesses, improvement trends, which persona types reps struggle with most.
Richer practice experience — Image or video avatars so employees practice against a visual "customer" with facial expressions and emotional cues, not just a waveform on a screen.
Targeted drills — Auto-generate follow-up scenarios from coach report weaknesses so reps can immediately practice what they missed.
Built With
- cloud-build
- docker
- fastapi
- framer-motion
- gemini
- gemini-embeddings-for-rag).-the-react-frontend-uses-vite-and-framer-motion;-the-fastapi-backend-serves-the-spa-and-agent-apis-on-google-cloud-run
- gemini-live-for-voice
- generative-ai
- generative-ai-nice-to-have-framer-motion
- google-adk
- google-cloud-run
- google-cloud-run-strong-adds-python
- javascript
- mongodb
- pydantic
- python
- rag
- react
- server-sent-events
- vertex-ai
- vite
- websocket
- with-a-separate-cloud-run-websocket-proxy-for-real-time-voice.-company-knowledge-lives-in-mongodb-atlas-with-vector-+-full-text-hybrid-search.-persona-generation-streams-progress-via-server-sent-events;-deploy-uses-docker
Log in or sign up for Devpost to join the conversation.