Inspiration
The inspiration for Kora came from experiencing the friction in traditional customer support systems and the need to enhance how customer support should be with integration of AI. I envisioned a world where getting help feels like talking to a knowledgeable friend rather than navigating through endless menus and wait times.
When I discovered the AI Partner Catalyst Hackathon's ElevenLabs Challenge, I saw the perfect opportunity to bring this vision to life by combining ElevenLabs' natural conversational AI with Google Gemini's intelligence.
What it does
Kora is a fully interactive, voice-first AI assistant designed to revolutionize customer support.
- Instant Voice Interaction: Users simply click the microphone button and speak naturally, just like talking to a human. There are no wake words or strict command syntax required.
- Intelligent Reasoning: Powered by Google Gemini (via ElevenLabs), Kora understands context, nuance, and intent. She can answer complex queries about "Kora AI Solutions" products, pricing, and technical details with consistent accuracy.
- Human-like Conversation: Using ElevenLabs Agents, the conversation flows naturally with ultra-low latency. Kora speaks with a warm, professional voice that conveys personality and empathy.
- Smart Flow Control: Kora is context-aware—she knows when a conversation is over. If a user says "thank you, goodbye" or "that's all," she intelligently triggers a client-side action to end the session automatically.
- Real-time Transcription: A live, auto-scrolling transcript provides immediate visual feedback of the conversation, ensuring accessibility and clarity.
How I built it
Initial Approach: Custom Pipeline
I initially attempted to build a custom voice pipeline from scratch:
Google Cloud Speech-to-Text for voice input
- Vertex AI Gemini for intelligence
- ElevenLabs TTS for voice output
- Custom WebSocket orchestration in FastAPI
However, I encountered significant technical challenges with this approach (detailed on Challenges section).
The Pivot: ElevenLabs Conversational AI
After careful consideration and reviewing the hackathon requirements, I pivoted to using ElevenLabs Conversational AI (Agents Platform) - which turned out to be the ideal architecture:
- Agent Configuration: Created "Kora" in the ElevenLabs dashboard with a custom system prompt defining her personality and knowledge base
- React Integration: Used the @elevenlabs/react SDK with the useConversation hook for seamless WebSocket management
- Enhanced UX: Built custom UI components, including:
- Real-time transcript with toggle controls
- Smart conversation ending via client tools
- Gradient-based, voice-first interface design
- Deployment: Deployed to Vercel for instant global availability ### Tech Stack
- Frontend: React + TypeScript + Vite + Tailwind CSS
- Voice: ElevenLabs Conversational AI Platform
- AI: Google Cloud Gemini (via ElevenLabs or custom LLM integration)
- Deployment: Vercel
Challenges I ran into
- Google Cloud Authentication Issues Problem: Encountered persistent 403 errors with Vertex AI API due to missing quota project configuration in Application Default Credentials.
Solution: Implemented explicit credential handling with ClientOptions(quota_project_id=...) to properly authenticate Speech-to-Text requests.
- Model Availability Problem: Multiple Gemini model versions (gemini-pro, gemini-1.5-flash-001, gemini-1.5-flash) returned 404 errors, blocking the custom pipeline approach.
Solution: Pivoted to ElevenLabs Conversational AI platform, which handles LLM integration seamlessly while still allowing Gemini integration as a custom backend.
- Backend Stability Problem: WebSocket connections crashed repeatedly due to configuration errors, corrupted credential files, and service initialization issues.
Solution: Simplified architecture by moving to ElevenLabs' managed infrastructure, eliminating the need for custom backend orchestration.
- UX Refinement Problem: Initial implementation had issues:
The agent continued listening indefinitely after the goodbyes Transcript couldn't be hidden/shown No visual feedback for conversation state
Solution:
- Implemented the endConversation client tool for intelligent conversation endings
- Added transcript toggle with auto-scroll
- Enhanced UI with real-time state indicators (Speaking/Listening).
- Time Constraints Problem: With the hackathon deadline approaching, debugging the custom pipeline was consuming critical time.
Solution: Strategic pivot to the ElevenLabs Agents platform allowed me to deliver a polished, working product that explicitly satisfied the challenge requirements.
Accomplishments that we're proud of
- Seamless AI Harmony: We successfully orchestrated a flawless handshake between ElevenLabs (for voice/personality) and Google Gemini (for intelligence), creating an experience that feels like a single, cohesive entity rather than disjointed APIs.
- True "Voice-First" UX: We resisted the urge to build a chatbot with voice tacked on. Instead, we built a truly immersive, hands-free interface where visual elements (like the transcript) recede to the background, allowing voice to take center stage.
- Intelligent Client-Side Actions: We're particularly proud of implementing the endConversation client tool. It gives Kora the social intelligence to know when a conversation is over ("Goodbye!", "Thanks!"), solving the awkward "zombie listener" problem common in voice bots.
- Resilience under Pressure: When our initial custom backend architecture failed due to API limitations, we didn't give up. We rapidly re-architected the entire solution to use the Agents Platform, re-wrote the frontend integration, and delivered a better, more robust product—all within the final hours of the hackathon.
- Ultra-Low Latency: By optimizing the WebSocket implementation and using the React SDK efficiently, we achieved near-instantaneous voice responses, maintaining the illusion of a real-time human conversation.
What I learned
- Platform vs. Custom Solutions: Sometimes, leveraging a purpose-built platform (like ElevenLabs Agents) is more powerful than building from scratch. It allowed me to focus on UX innovation rather than infrastructure.
- Voice-First Design: Designing for voice requires a different mindset than traditional web apps. Key learnings:
- Minimize visual clutter
- Provide clear conversation state feedback
- Auto-scrolling transcripts enhance usability
- Smart conversation endings improve naturalness
- Client Tools Architecture: ElevenLabs' client tools feature enables sophisticated client-side actions triggered by the AI, creating seamless user experiences (like auto-ending conversations on "goodbye").
- WebSocket Management: Real-time bidirectional communication requires careful state management and error handling for a smooth user experience.
- Rapid Iteration: The ability to quickly iterate on agent personality via dashboard configuration accelerated development significantly.
What's next for Kora Voice Assistant
I see Kora as just the beginning of the next generation of voice interfaces. Our roadmap includes:
- Deep RAG Integration: Connecting the Gemini backend to real-time enterprise knowledge bases (Notion, Google Drive, SQL) so Kora can answer questions about dynamic internal data.
- Multimodal Capabilities: Integrating Gemini Pro Vision to allow users to show Kora objects via their camera for visual technical support (e.g., "Kora, what does this error light on my router mean?").
- Omnichannel Deployment: Leveraging ElevenLabs' telephony features to deploy Kora beyond the web—handling actual phone calls and SMS/WhatsApp interactions seamlessly.
- Emotional Intelligence: enhancing Kora's system prompt to detect user frustration via voice tone analysis and automatically escalate to human agents when empathy is needed most.
- Multi-language Support: Expanding Kora's capabilities to support real-time translation and conversation in 29+ languages using ElevenLabs' multilingual model.
Built With
- elevenlabs-conversational-ai
- elevenlabs-react-sdk
- eslint
- gemini
- github
- google-cloud-vertex-ai
- mediarecorder-api
- npm
- postcss
- react
- tailwind-css
- typescript
- vercel
- vite
- web-audio-api
- websocket-api

Log in or sign up for Devpost to join the conversation.