System Architecture
Work Flow
AI Speaking Coach
Practice Session
Analysis
Virtual team meeting
Comparison with pros
Progress dashboard
Reports
coach voices
Status

Project Story

Inspiration

Public speaking anxiety isn't just a statistic—it's a silent barrier holding back dreams. We've seen brilliant minds freeze on stage, watched innovative ideas stumble over trembling words, and felt our own hearts race before a microphone. This isn't about perfecting speeches; it's about unlocking voices.

I built Vocal Coach AI because everyone deserves to be heard—clearly, confidently, and authentically. The fear of "um"s and shaky delivery shouldn't silence potential. When I saw the ElevenLabs Challenge focusing on conversational, voice-driven AI, I knew this was the opportunity to create not just another analysis tool, but a patient, always-available coach that listens first and teaches second.

What it does

Vocal Coach AI transforms public speaking practice through natural conversation. Forget clicking buttons and reading scores—you simply talk, and your AI coach, Alex, talks back.

🎤 Voice Practice: Record any speech and receive instant AI analysis on clarity, confidence, pacing, and filler words.

🗣️ Conversational Coaching: Talk naturally with Coach Alex about your speaking challenges. Say "I'm nervous about presentations" and receive breathing exercises. Mention "filler words" and get targeted pause-practice techniques.

💼 Virtual Meeting Prep: Practice specifically for Zoom interviews or Teams presentations with platform-specific advice.

🏆 Compare with Pros: See how your delivery compares to iconic speakers like Steve Jobs or Brené Brown, with actionable steps to bridge the gap.

📈 Progress Tracking: Watch your improvement unfold through detailed analytics and session history.

The magic happens when you realize: you're not talking to a machine, you're practicing with a coach.

How we built it

We engineered Vocal Coach AI as a symphony of cutting-edge technologies, each playing a crucial role:

The Brain (Google Gemini AI)

# Gemini powers both analysis and conversation
analysis = await gemini_service.analyze_speech(user_speech)
# Clarity: 8/10, Confidence: 7/10, Filler words: 3

The Voice (ElevenLabs + Browser TTS)

// Voice synthesis for natural feedback
const utterance = new SpeechSynthesisUtterance(
  "Great job! Your clarity improved by 20% this week."
);

The Foundation

Frontend: React.js with Material-UI, deployed on Firebase Hosting
Backend: FastAPI (Python) running on Railway
Database: Firebase Firestore for session storage
Architecture: Microservices communicating via REST APIs

The technical stack represents a perfect balance: Google Cloud's AI prowess meeting ElevenLabs' voice innovation, all delivered through accessible web technologies.

Challenges we ran into

The Wall of Silence: Early in development, our conversational AI gave the same generic response to everything. "I'm nervous" and "Hello" received identical feedback. We discovered our intent detection system was oversimplified. The solution? Layered logic:

if "nervous" in user_message:
    return breathing_exercises()
elif "filler" in user_message:
    return pause_practice()
elif "fast" in user_message:
    return pacing_drills()
else:
    return gemini_contextual_response()

The Deployment Dilemma: Google Cloud Run required billing setup we couldn't complete. Rather than abandon ship, we pivoted to Railway, learning that constraints often breed creativity. Our backend found a new home, proving that resilience matters more than perfect infrastructure.

The Voice Barrier: ElevenLabs' free tier limitations forced us to implement intelligent fallbacks. We created a system that prioritizes user experience over perfect technology:

$$ \text{User Experience} = \frac{\text{Functionality} \times \text{Reliability}}{\text{Complexity}} $$

This formula reminded us: sometimes browser TTS with perfect reliability beats cutting-edge API with sporadic access.

**Accomplishments that we're proud of

Creating Real Conversation: Building an AI that doesn't just analyze but understands context. When a user says "I tried that breathing exercise but still feel anxious," Coach Alex remembers and suggests progressive muscle relaxation instead.
Full Production Deployment: Taking the project from localhost to fully live at gen-lang-client-0181311027.web.app with zero infrastructure costs.
The "Aha!" Moment: Watching test users' faces light up when they realized they could just talk naturally instead of navigating complex interfaces.
Technical Integration Mastery: Successfully weaving together four major platforms (Google Cloud, ElevenLabs, Firebase, Railway) into a seamless experience.

**What we learned

Technology teaches humility. No matter how advanced your AI, if users can't figure out the microphone button, it's useless. We learned to prioritize simplicity over sophistication.

Constraints spark innovation. Being unable to use certain services forced us to discover better alternatives. Railway's deployment simplicity might become our new standard.

Voice is personal. People don't just want analysis—they want empathy, encouragement, and patience. Our biggest technical lesson? The response.tone parameter matters as much as the response.content.

The equation for effective learning tools: $$ \text{Effective Learning} = \text{Technical Accuracy} \times \text{Emotional Safety}^2 $$

When people feel safe to fail, they learn exponentially faster.

**What's next for Vocal Coach AI

Immediate Horizon (Next 3 months):

Real ElevenLabs STT Integration: Replace mock transcription with actual speech-to-text
Multi-language Support: Help non-native English speakers gain confidence
Mobile App: Take practice anywhere with a dedicated iOS/Android application

Future Vision (Next year):

Group Practice Sessions: Virtual rooms where users can practice together with AI moderation
Accent Appreciation Module: Celebrate linguistic diversity while improving clarity
AR Teleprompter: Practice with real-time feedback overlaid on your environment
Emotion Detection: Camera-based analysis of facial expressions and body language

The Dream: We envision a world where no idea goes unshared because of speaking anxiety. Where every student, professional, and community leader has a patient AI coach in their pocket. Where the question shifts from "Can I say this?" to "How can I say this most effectively?"

Vocal Coach AI isn't just an app—it's the beginning of democratizing eloquence. Because when we find our voice, we find our power.

Built with passion, coded with purpose, and shared with hope. 🎤✨

Built With

css
eleven-labs
elevenlabs-python-sdk
firebase
git
github
google-ai-python-sdk
google-cloud
html
javascript
python
railway
react
vs-code

Submitted to

AI Partner Catalyst: Accelerate Innovation

Created by

I conceptualized, architected, and built Vocal Coach AI from the ground up as a solo developer. I designed the full-stack application, integrating Google Gemini AI for intelligent speech analysis and conversation, and implemented the ElevenLabs voice API to create a natural, conversational AI coach. I deployed the backend on Railway and the frontend on Firebase, ensuring a fully functional, live application that meets the core challenge of enabling users to interact entirely through speech.

Temoor Hussain
Software Engineer specializing in React/React Native, with skills in project management, data science

Updates

Temoor Hussain started this project — Dec 29, 2025 11:23 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.