Kora Voice Assistant

Kora Voice Assistant Homepage

Inspiration

The inspiration for Kora came from experiencing the friction in traditional customer support systems and the need to enhance how customer support should be with integration of AI. I envisioned a world where getting help feels like talking to a knowledgeable friend rather than navigating through endless menus and wait times.

When I discovered the AI Partner Catalyst Hackathon's ElevenLabs Challenge, I saw the perfect opportunity to bring this vision to life by combining ElevenLabs' natural conversational AI with Google Gemini's intelligence.

What it does

Kora is a fully interactive, voice-first AI assistant designed to revolutionize customer support.

Instant Voice Interaction: Users simply click the microphone button and speak naturally, just like talking to a human. There are no wake words or strict command syntax required.
Intelligent Reasoning: Powered by Google Gemini (via ElevenLabs), Kora understands context, nuance, and intent. She can answer complex queries about "Kora AI Solutions" products, pricing, and technical details with consistent accuracy.
Human-like Conversation: Using ElevenLabs Agents, the conversation flows naturally with ultra-low latency. Kora speaks with a warm, professional voice that conveys personality and empathy.
Smart Flow Control: Kora is context-aware—she knows when a conversation is over. If a user says "thank you, goodbye" or "that's all," she intelligently triggers a client-side action to end the session automatically.
Real-time Transcription: A live, auto-scrolling transcript provides immediate visual feedback of the conversation, ensuring accessibility and clarity.

How I built it

Initial Approach: Custom Pipeline

I initially attempted to build a custom voice pipeline from scratch:

Google Cloud Speech-to-Text for voice input

Vertex AI Gemini for intelligence
ElevenLabs TTS for voice output
Custom WebSocket orchestration in FastAPI

However, I encountered significant technical challenges with this approach (detailed on Challenges section).

The Pivot: ElevenLabs Conversational AI

After careful consideration and reviewing the hackathon requirements, I pivoted to using ElevenLabs Conversational AI (Agents Platform) - which turned out to be the ideal architecture:

Agent Configuration: Created "Kora" in the ElevenLabs dashboard with a custom system prompt defining her personality and knowledge base
React Integration: Used the @elevenlabs/react SDK with the useConversation hook for seamless WebSocket management
Enhanced UX: Built custom UI components, including:
- Real-time transcript with toggle controls
- Smart conversation ending via client tools
- Gradient-based, voice-first interface design
Deployment: Deployed to Vercel for instant global availability ### Tech Stack
Frontend: React + TypeScript + Vite + Tailwind CSS
Voice: ElevenLabs Conversational AI Platform
AI: Google Cloud Gemini (via ElevenLabs or custom LLM integration)
Deployment: Vercel

Challenges I ran into

Google Cloud Authentication Issues Problem: Encountered persistent 403 errors with Vertex AI API due to missing quota project configuration in Application Default Credentials.

Solution: Implemented explicit credential handling with ClientOptions(quota_project_id=...) to properly authenticate Speech-to-Text requests.

Model Availability Problem: Multiple Gemini model versions (gemini-pro, gemini-1.5-flash-001, gemini-1.5-flash) returned 404 errors, blocking the custom pipeline approach.

Solution: Pivoted to ElevenLabs Conversational AI platform, which handles LLM integration seamlessly while still allowing Gemini integration as a custom backend.

Backend Stability Problem: WebSocket connections crashed repeatedly due to configuration errors, corrupted credential files, and service initialization issues.

Solution: Simplified architecture by moving to ElevenLabs' managed infrastructure, eliminating the need for custom backend orchestration.

UX Refinement Problem: Initial implementation had issues:

The agent continued listening indefinitely after the goodbyes Transcript couldn't be hidden/shown No visual feedback for conversation state

Solution:

Implemented the endConversation client tool for intelligent conversation endings
Added transcript toggle with auto-scroll
Enhanced UI with real-time state indicators (Speaking/Listening).

Time Constraints Problem: With the hackathon deadline approaching, debugging the custom pipeline was consuming critical time.

Solution: Strategic pivot to the ElevenLabs Agents platform allowed me to deliver a polished, working product that explicitly satisfied the challenge requirements.

Accomplishments that we're proud of

Seamless AI Harmony: We successfully orchestrated a flawless handshake between ElevenLabs (for voice/personality) and Google Gemini (for intelligence), creating an experience that feels like a single, cohesive entity rather than disjointed APIs.
True "Voice-First" UX: We resisted the urge to build a chatbot with voice tacked on. Instead, we built a truly immersive, hands-free interface where visual elements (like the transcript) recede to the background, allowing voice to take center stage.
Intelligent Client-Side Actions: We're particularly proud of implementing the endConversation client tool. It gives Kora the social intelligence to know when a conversation is over ("Goodbye!", "Thanks!"), solving the awkward "zombie listener" problem common in voice bots.
Resilience under Pressure: When our initial custom backend architecture failed due to API limitations, we didn't give up. We rapidly re-architected the entire solution to use the Agents Platform, re-wrote the frontend integration, and delivered a better, more robust product—all within the final hours of the hackathon.
Ultra-Low Latency: By optimizing the WebSocket implementation and using the React SDK efficiently, we achieved near-instantaneous voice responses, maintaining the illusion of a real-time human conversation.

What I learned

Platform vs. Custom Solutions: Sometimes, leveraging a purpose-built platform (like ElevenLabs Agents) is more powerful than building from scratch. It allowed me to focus on UX innovation rather than infrastructure.
Voice-First Design: Designing for voice requires a different mindset than traditional web apps. Key learnings:
- Minimize visual clutter
- Provide clear conversation state feedback
- Auto-scrolling transcripts enhance usability
- Smart conversation endings improve naturalness
Client Tools Architecture: ElevenLabs' client tools feature enables sophisticated client-side actions triggered by the AI, creating seamless user experiences (like auto-ending conversations on "goodbye").
WebSocket Management: Real-time bidirectional communication requires careful state management and error handling for a smooth user experience.
Rapid Iteration: The ability to quickly iterate on agent personality via dashboard configuration accelerated development significantly.

What's next for Kora Voice Assistant

I see Kora as just the beginning of the next generation of voice interfaces. Our roadmap includes:

Deep RAG Integration: Connecting the Gemini backend to real-time enterprise knowledge bases (Notion, Google Drive, SQL) so Kora can answer questions about dynamic internal data.
Multimodal Capabilities: Integrating Gemini Pro Vision to allow users to show Kora objects via their camera for visual technical support (e.g., "Kora, what does this error light on my router mean?").
Omnichannel Deployment: Leveraging ElevenLabs' telephony features to deploy Kora beyond the web—handling actual phone calls and SMS/WhatsApp interactions seamlessly.
Emotional Intelligence: enhancing Kora's system prompt to detect user frustration via voice tone analysis and automatically escalate to human agents when empathy is needed most.
Multi-language Support: Expanding Kora's capabilities to support real-time translation and conversation in 29+ languages using ElevenLabs' multilingual model.

Built With

elevenlabs-conversational-ai
elevenlabs-react-sdk
eslint
gemini
github
google-cloud-vertex-ai
mediarecorder-api
npm
postcss
react
tailwind-css
typescript
vercel
vite
web-audio-api
websocket-api

Submitted to

AI Partner Catalyst: Accelerate Innovation

Created by

I was the sole developer for this project, handling 100% of the work from conception to deployment.

My specific contributions included:

- Architecture & Design: Conceptualized the "Voice-First" UX and architected the integration between ElevenLabs and Google Gemini.
- Frontend Engineering: Built the React application from scratch, utilizing the ElevenLabs React SDK and implementing complex WebSocket state management.
- AI Configuration: Designed the agent's persona ("Kora") and system prompts to ensure accurate, professional responses about Kora AI Solutions.
- Feature Implementation: Coded all custom features, including the real-time visualizer, transcript toggle, and the client-side auto-end conversation logic.
- Deployment: Managed the full deployment pipeline to Vercel.

Muhammad Mwinchande
Senior SWE with 6+ years building scalable solutions | Full-stack: Java, PHP, React, Flutter | Passionate about solving real problems

Updates

Muhammad Mwinchande started this project — Dec 31, 2025 02:46 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.