Project Story
Inspiration
Public speaking anxiety isn't just a statistic—it's a silent barrier holding back dreams. We've seen brilliant minds freeze on stage, watched innovative ideas stumble over trembling words, and felt our own hearts race before a microphone. This isn't about perfecting speeches; it's about unlocking voices.
I built Vocal Coach AI because everyone deserves to be heard—clearly, confidently, and authentically. The fear of "um"s and shaky delivery shouldn't silence potential. When I saw the ElevenLabs Challenge focusing on conversational, voice-driven AI, I knew this was the opportunity to create not just another analysis tool, but a patient, always-available coach that listens first and teaches second.
What it does
Vocal Coach AI transforms public speaking practice through natural conversation. Forget clicking buttons and reading scores—you simply talk, and your AI coach, Alex, talks back.
🎤 Voice Practice: Record any speech and receive instant AI analysis on clarity, confidence, pacing, and filler words.
🗣️ Conversational Coaching: Talk naturally with Coach Alex about your speaking challenges. Say "I'm nervous about presentations" and receive breathing exercises. Mention "filler words" and get targeted pause-practice techniques.
💼 Virtual Meeting Prep: Practice specifically for Zoom interviews or Teams presentations with platform-specific advice.
🏆 Compare with Pros: See how your delivery compares to iconic speakers like Steve Jobs or Brené Brown, with actionable steps to bridge the gap.
📈 Progress Tracking: Watch your improvement unfold through detailed analytics and session history.
The magic happens when you realize: you're not talking to a machine, you're practicing with a coach.
How we built it
We engineered Vocal Coach AI as a symphony of cutting-edge technologies, each playing a crucial role:
The Brain (Google Gemini AI)
# Gemini powers both analysis and conversation
analysis = await gemini_service.analyze_speech(user_speech)
# Clarity: 8/10, Confidence: 7/10, Filler words: 3
The Voice (ElevenLabs + Browser TTS)
// Voice synthesis for natural feedback
const utterance = new SpeechSynthesisUtterance(
"Great job! Your clarity improved by 20% this week."
);
The Foundation
- Frontend: React.js with Material-UI, deployed on Firebase Hosting
- Backend: FastAPI (Python) running on Railway
- Database: Firebase Firestore for session storage
- Architecture: Microservices communicating via REST APIs
The technical stack represents a perfect balance: Google Cloud's AI prowess meeting ElevenLabs' voice innovation, all delivered through accessible web technologies.
Challenges we ran into
The Wall of Silence: Early in development, our conversational AI gave the same generic response to everything. "I'm nervous" and "Hello" received identical feedback. We discovered our intent detection system was oversimplified. The solution? Layered logic:
if "nervous" in user_message:
return breathing_exercises()
elif "filler" in user_message:
return pause_practice()
elif "fast" in user_message:
return pacing_drills()
else:
return gemini_contextual_response()
The Deployment Dilemma: Google Cloud Run required billing setup we couldn't complete. Rather than abandon ship, we pivoted to Railway, learning that constraints often breed creativity. Our backend found a new home, proving that resilience matters more than perfect infrastructure.
The Voice Barrier: ElevenLabs' free tier limitations forced us to implement intelligent fallbacks. We created a system that prioritizes user experience over perfect technology:
$$ \text{User Experience} = \frac{\text{Functionality} \times \text{Reliability}}{\text{Complexity}} $$
This formula reminded us: sometimes browser TTS with perfect reliability beats cutting-edge API with sporadic access.
**Accomplishments that we're proud of
Creating Real Conversation: Building an AI that doesn't just analyze but understands context. When a user says "I tried that breathing exercise but still feel anxious," Coach Alex remembers and suggests progressive muscle relaxation instead.
Full Production Deployment: Taking the project from localhost to fully live at
gen-lang-client-0181311027.web.appwith zero infrastructure costs.The "Aha!" Moment: Watching test users' faces light up when they realized they could just talk naturally instead of navigating complex interfaces.
Technical Integration Mastery: Successfully weaving together four major platforms (Google Cloud, ElevenLabs, Firebase, Railway) into a seamless experience.
**What we learned
Technology teaches humility. No matter how advanced your AI, if users can't figure out the microphone button, it's useless. We learned to prioritize simplicity over sophistication.
Constraints spark innovation. Being unable to use certain services forced us to discover better alternatives. Railway's deployment simplicity might become our new standard.
Voice is personal. People don't just want analysis—they want empathy, encouragement, and patience. Our biggest technical lesson? The response.tone parameter matters as much as the response.content.
The equation for effective learning tools: $$ \text{Effective Learning} = \text{Technical Accuracy} \times \text{Emotional Safety}^2 $$
When people feel safe to fail, they learn exponentially faster.
**What's next for Vocal Coach AI
Immediate Horizon (Next 3 months):
- Real ElevenLabs STT Integration: Replace mock transcription with actual speech-to-text
- Multi-language Support: Help non-native English speakers gain confidence
- Mobile App: Take practice anywhere with a dedicated iOS/Android application
Future Vision (Next year):
- Group Practice Sessions: Virtual rooms where users can practice together with AI moderation
- Accent Appreciation Module: Celebrate linguistic diversity while improving clarity
- AR Teleprompter: Practice with real-time feedback overlaid on your environment
- Emotion Detection: Camera-based analysis of facial expressions and body language
The Dream: We envision a world where no idea goes unshared because of speaking anxiety. Where every student, professional, and community leader has a patient AI coach in their pocket. Where the question shifts from "Can I say this?" to "How can I say this most effectively?"
Vocal Coach AI isn't just an app—it's the beginning of democratizing eloquence. Because when we find our voice, we find our power.
Built with passion, coded with purpose, and shared with hope. 🎤✨
Built With
- css
- eleven-labs
- elevenlabs-python-sdk
- firebase
- git
- github
- google-ai-python-sdk
- google-cloud
- html
- javascript
- python
- railway
- react
- vs-code

Log in or sign up for Devpost to join the conversation.