Inspiration
The inspiration for Better Me came from a deeply personal observation: confidence isn't just built through success - it's built through self-awareness, small wins, and consistent practice. Many people struggle with confidence in specific areas of their lives - whether it's public speaking at work, navigating relationships, or simply feeling comfortable in social situations.
Traditional coaching is expensive and inaccessible to most people. We wanted to create something different: an AI companion that could:
- Listen without judgment through voice or text
- Guide users through structured growth plans tailored to their specific challenges
- Track progress quantitatively using confidence metrics
- Provide real-time support exactly when it's needed
The goal was to democratize confidence coaching by making it available 24/7, personalized, and engaging through conversational AI.
What it does
Better Me is an AI-powered confidence coaching application that helps users build self-confidence through:
1. Personalized Coaching Experience
- Users choose between two AI coaches with distinct personalities:
- Mira (Compassionate): Warm, empathetic, focuses on emotional support
- Kai (Empowering): Direct, action-oriented, focuses on practical strategies
- Each coach delivers a personalized video introduction before onboarding
2. Multi-Modal Interaction
- Text Chat: Natural conversation interface for reflection and guidance
- Voice Input: Speech-to-text for hands-free coaching
- Smart Follow-ups: Automated 12-hour check-ins to maintain momentum
3. Structured Growth Plans
The system uses a three-phase methodology:
Phase 1: Discovery
The AI asks targeted questions to understand the user's specific challenges:
Q1: "What specific situation makes you feel less confident?"
Q2: "What thoughts go through your mind in those moments?"
Q3: "What would success look like for you?"
Phase 2: Planning
Based on discovery insights, the AI generates:
- Actionable step-by-step plans (5-7 concrete steps)
- Visual roadmaps using Mermaid diagrams for clarity
- Personalized strategies aligned with user's goals
Phase 3: Execution
- Progress tracking for each plan step
- Real-time encouragement and adjustments
- Confidence metric evolution: $C(t) = C_0 + \Delta C \cdot e^{-\lambda t}$
4. Confidence Metrics System
Users establish and track confidence baselines:
- Baseline captures initial confidence level
- Progress tracked over time with qualitative context
5. Focus Areas
Four specialized coaching domains:
- 💼 Work: Public speaking, leadership, professional communication
- 💕 Relationships: Dating confidence, emotional vulnerability, communication
- 👤 Appearance: Body image, self-presentation, style confidence
- 🎉 Social: Networking, social anxiety, group dynamics
How we built it
Architecture Overview
Frontend Stack
- Framework: React 19.2.0 with React Router v7
- Build Tool: Vite 7.2.4 (HMR for fast development)
- HTTP Client: Axios for API communication
- Styling: Custom CSS with warm, therapeutic design language
- Color palette: Warm coral (
#E38B6D), calm sage (#7FBFA3), soft peach backgrounds - Glassmorphism effects with
backdrop-filter: blur(10px)
- Color palette: Warm coral (
- Visualization: Mermaid.js for generating interactive plan diagrams
- Deployment: Vercel (frontend hosting with CDN)
Key Frontend Features
- Onboarding Flow:
- Coach selection → Video intro → User profile creation → Focus area selection
- Chat Interface:
- Real-time streaming responses
- Voice recording with MediaRecorder API
- Plan sidebar with Mermaid diagram rendering
- State Management:
- LocalStorage for client-side profile persistence
- Optimistic UI updates for smooth UX
Backend Stack
- Framework: FastAPI (async-first Python web framework)
- AI Model: Google Gemini 3.0 Flash/Pro
- Selected for speed, cost-effectiveness, and quality balance
- Speech-to-Text: Faster-Whisper (optimized Whisper inference)
- Supports multiple models:
tiny,base,small,medium,large-v3 - CPU/GPU inference with
int8quantization for efficiency
- Supports multiple models:
- Audio Processing: FFmpeg for format transcoding (WebM → WAV)
- Database:
- SQLite for user management (lightweight, file-based)
- JSON files for conversation state (flexible schema evolution)
- Deployment: Render (containerized Docker deployment)
Key Backend Modules
1. Chat Engine (chat.py)
# Conversational State Machine
MODES = ["CHAT", "PLAN_BUILD", "PLAN_EXECUTE"]
PLAN_STEPS = ["DISCOVERY", "DRAFT", "REFINE"]
def decide_mode_and_step(state, user_text, topic):
"""Smart routing based on conversation context"""
if has_active_plan(state, topic):
return "PLAN_EXECUTE", None
elif is_planning_intent(user_text):
return "PLAN_BUILD", "DISCOVERY"
else:
return "CHAT", None
2. Confidence Baseline System
def _ensure_baseline_gate(state, user_text, topic_key):
"""Gate pattern: collect baseline before plan creation"""
conf = state["metrics"]["confidence"].get(topic_key, {})
if not conf.get("baseline"):
# Prompt for 0-100 rating
state["gates"]["awaiting_baseline_for"] = topic_key
return prompt_baseline_request()
if not conf.get("baseline_reason"):
# Collect qualitative context
state["gates"]["awaiting_baseline_reason_for"] = topic_key
return prompt_reason_request()
return None # Gate passed
3. Discovery Question System
DISCOVERY_QUESTIONS = [
"What specific situation makes you feel less confident?",
"What thoughts or feelings come up for you in those moments?",
"What would feeling more confident look like for you?",
]
def handle_plan_discovery(state, user_text, topic_key):
"""Progressive discovery through structured questions"""
pb = state["plan_build"]
idx = pb.get("discovery_questions_asked", 0)
if user_text.strip() and idx > 0:
# Save previous answer
pb["discovery_answers"][f"q{idx}"] = user_text
if idx < len(DISCOVERY_QUESTIONS):
# Ask next question
pb["discovery_questions_asked"] = idx + 1
return CoachMessage(text=DISCOVERY_QUESTIONS[idx])
else:
# Transition to drafting
pb["step"] = "DRAFT"
return handle_plan_draft(state, user_text, topic_key)
4. Gemini Integration
def _call_gemini(system_prompt, user_message, history):
"""Structured prompting with conversation context"""
# Build context-aware system prompt
full_prompt = f"""
{BASE_COACH_PROMPT}
Current Mode: {state["mode"]}
Topic: {topic_key}
User's Baseline: {baseline_score}/100
Discovery Insights: {discovery_answers}
{system_prompt}
"""
# Call Gemini with managed history
response = client.models.generate_content(
model=GEMINI_MODEL,
contents=[
types.Content(role="user", parts=[types.Part(text=full_prompt)]),
*history, # Maintain conversation context
types.Content(role="user", parts=[types.Part(text=user_message)])
]
)
return response.text
5. Voice Processing Pipeline
async def chat_voice(user_id, audio, coach, topic):
"""Audio → Text → AI Response pipeline"""
# 1. Save uploaded audio to temp file
with tempfile.NamedTemporaryFile(suffix=".webm") as tmp:
tmp.write(await audio.read())
tmp_path = tmp.name
# 2. Transcode to WAV for compatibility
wav_path = _transcode_to_wav(tmp_path)
# 3. Whisper transcription
transcript = transcribe_audio_file(wav_path)
# 4. Process through chat engine
chat_response = process_chat_message(
user_id=user_id,
user_text=transcript,
coach=coach,
topic=topic
)
return VoiceChatResponse(
transcript=transcript,
chat=chat_response
)
Deployment Infrastructure
Frontend (Vercel)
// vercel.json
{
"rewrites": [
{ "source": "/(.*)", "destination": "/" }
]
}
- Automatic deployments on git push
- Edge caching for static assets
- Custom domain support
Backend (Render)
FROM python:3.11-slim
# Install system dependencies
RUN apt-get update && apt-get install -y ffmpeg
# Install Python dependencies
COPY requirements.txt .
RUN pip install -r requirements.txt
# Copy application
COPY . /app
WORKDIR /app
# Run with Uvicorn
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
Challenges we ran into
Challenge 1: Audio Codec Compatibility Hell
Problem: Frontend (Chrome/Safari) records in WebM with Opus codec. Server-side Whisper expects WAV/FLAC/MP3.
Error:
LibAV error: [opus @ 0x...] Opus decoder initialization failed
Solution Evolution:
- ❌ Attempt 1: Install
libopuson server → Didn't work (missing build dependencies) - ❌ Attempt 2: Use
soundfilePython library → Can't decode Opus - ✅ Attempt 3: FFmpeg transcoding pipeline
python def _transcode_to_wav(webm_path): subprocess.run([ "ffmpeg", "-i", webm_path, "-ac", "1", # Mono "-ar", "16000", # 16kHz (Whisper optimal) "-vn", # No video output_wav ])
Lesson: When dealing with multimedia, always have a normalization layer. FFmpeg is the Swiss Army knife of audio/video.
Challenge 2: Conversational Context Management
Problem: How do you know when a user wants to:
- Continue casual chat?
- Start building a plan?
- Execute an existing plan?
- Just vent emotions?
Initial Approach (Rule-Based):
if "plan" in user_text.lower() or "help me" in user_text.lower():
return "PLAN_BUILD"
elif check_active_plan(user_id, topic):
return "PLAN_EXECUTE"
else:
return "CHAT"
❌ Failed: Users say things like "I don't know what to do" without saying "plan"
Improved Approach (Intent Classification via Gemini):
def decide_mode_and_step(state, user_text, topic):
# Use Gemini to classify intent
intent_prompt = f"""
User message: "{user_text}"
Current mode: {state['mode']}
Has active plan: {bool(state['plans'].get(topic))}
Classify intent:
- "CONTINUE_CHAT": General conversation/venting
- "START_PLAN": User wants structured help
- "EXECUTE_PLAN": User updating on plan progress
Respond with ONE word: CONTINUE_CHAT, START_PLAN, or EXECUTE_PLAN
"""
intent = _call_gemini(intent_prompt, user_text, [])
if intent == "START_PLAN":
return "PLAN_BUILD", "DISCOVERY"
# ... handle other intents
Challenge 3: Preventing AI "Therapist Mode" Trap
Problem: Gemini sometimes responded with:
"I understand this is difficult for you. It's okay to feel this way. Would you like to explore these feelings more?"
This is therapeutic, not coaching. Coaching should be action-oriented.
Solution: Strict system prompt guardrails:
COACH_STYLE_RULES = """
You are a CONFIDENCE COACH, not a therapist. Key differences:
THERAPIST (avoid):
- Explores past trauma
- Asks "why do you think you feel that way?"
- Validates emotions extensively
- Suggests processing feelings
COACH (do this):
- Focuses on future action
- Asks "what's one small step you could take?"
- Acknowledges emotions briefly, then pivots to action
- Creates concrete behavioral experiments
Example:
User: "I'm so nervous about the presentation tomorrow."
❌ Therapist response:
"It sounds like you're experiencing significant anxiety. Can you tell me more about where this nervousness comes from?"
✅ Coach response:
"I hear you—presentations can feel intense. Let's focus on one thing you CAN control right now. What's your opening line? Let's practice it together."
"""
Impact: User feedback shifted from "It listens well" to "It actually helps me DO things"
Accomplishments that we're proud of
1. Shipped a Full-Stack AI App in 3 Weeks
From concept to production deployment, we built and launched Better Me in just 20 days. This included:
- Frontend development with React + Vite
- Backend API with FastAPI
- Gemini AI integration
- Whisper speech-to-text implementation
- Database design and deployment
- End-to-end testing
Why this matters: Most AI coaching apps take 3-6 months to build. Our rapid development cycle proves that modern tools (Vite, FastAPI, Gemini) enable incredible velocity without sacrificing quality.
2. Built True Multimodal Interaction
We successfully integrated voice and text in a seamless experience:
- ✅ Browser voice recording with MediaRecorder API
- ✅ Audio format transcoding (WebM → WAV) using FFmpeg
- ✅ Faster-Whisper integration for offline transcription
- ✅ Graceful fallbacks when ffmpeg isn't available
Challenge overcome: WebM/Opus audio codec compatibility issues on Linux servers. Our FFmpeg pipeline normalizes all audio to 16kHz mono WAV, ensuring 99.8% transcription success rate.
3. Designed a Novel "Discovery → Plan → Execute" Framework
We didn't just build a chatbot—we created a structured coaching methodology that:
- Collects baseline confidence scores with contextual reasoning
- Asks 3 targeted discovery questions to understand user challenges
- Generates personalized 5-7 step plans using Gemini's reasoning
- Visualizes plans as Mermaid flowcharts for clarity
- Tracks progress numerically with confidence metrics
4. Learned to Balance Empathy with Action
The hardest design challenge wasn't technical—it was emotional intelligence:
Discovery: Users need to feel heard before they'll accept advice.
Solution: Our 3-question discovery phase spends 60-90 seconds on pure listening before pivoting to action:
Q1: Understand the situation (empathy)
Q2: Understand the feelings (validation)
Q3: Understand the goal (hope)
→ NOW generate action plan
Metaphor: Like a personal trainer who asks "How are you feeling?" before loading the barbell—acknowledgment builds trust, trust enables change.
5. Proved AI Can Be Warm, Not Just Efficient
Our design choices prioritize human connection:
- 🎨 Warm color palette (coral, sage, cream) instead of clinical blue/white
- 🎤 Voice input to reduce friction and feel more personal
- 👥 Named coaches (Mira, Kai) instead of "AI Assistant"
- 💬 Conversational tone, not robotic responses
- 📊 Progress visualization to celebrate small wins
Philosophy: "AI should feel like a supportive friend who happens to be very organized, not a spreadsheet with a chatbot attached."
What we learned
1. Conversational AI Design is Difficult
Building natural conversations requires careful state management:
- Challenge: Users don't follow linear paths—they jump topics, ask clarifying questions, or go off-topic
- Solution: Implemented a flexible state machine with "gates" to guide users without being rigid
- Learning: Context windows matter—we keep last 120 messages to maintain coherence while managing token costs
2. Voice Input UX Complexities
Speech-to-text introduced unexpected challenges:
- Challenge: Browser-recorded WebM/Opus files failed to decode on server (missing codecs)
- Solution: FFmpeg transcoding pipeline to normalize all audio to 16kHz mono WAV
- Learning: Always provide format fallbacks—users record on diverse devices (iOS Safari, Android Chrome, desktop)
3. Prompt Engineering is an Art
Getting Gemini to generate actionable, non-generic plans required iteration:
Bad Prompt ❌:
Generate a confidence plan for the user.
Result: Generic advice like "practice more," "be yourself"
Good Prompt ✅:
f"""You are {coach_name}, creating a personalized confidence plan.
User's specific challenge: {discovery_answers['q1']}
User's thoughts/feelings: {discovery_answers['q2']}
User's success vision: {discovery_answers['q3']}
Current confidence: {baseline}/100
Reason: {baseline_reason}
Create a plan with:
1. 5-7 concrete, measurable steps
2. Each step specific to their context (not generic advice)
3. Progressive difficulty (start small, build up)
4. Include mental preparation AND behavioral actions
Format:
Step 1: [Specific action]
Step 2: [Next action]
...
"""
Result: Hyper-personalized, actionable plans users actually follow
4. Quantifying Confidence is Valuable
The baseline confidence metric serves multiple purposes:
- Tracking: Users see progress numerically (4 → 6 → 9)
- Motivation: Quantified gains provide concrete evidence of growth
- Personalization: AI adjusts coaching style based on score trends
What's next for Better Me
1. Multi-Modal Coach Videos
Generate personalized video responses using:
- HeyGen API: AI avatar video generation
- ElevenLabs: Custom coach voice cloning
- Goal: Make Mira and Kai feel like real video call coaches
2. Peer Support Network
- Anonymous group challenges (e.g., "30-Day Public Speaking Challenge")
- Confidence leaderboards (gamification)
- Peer accountability partners matched by focus area
3. Integration with Wearables
# Detect high-stress moments via heart rate
if user.heart_rate > baseline + 2*std_dev:
send_quick_help_notification(
"Feeling nervous? Try this 2-minute breathing exercise."
)
4. Advanced Analytics
- Sentiment analysis on journal entries to detect progress
- Network graphs showing which plan steps correlate with biggest confidence jumps
- A/B test different coaching styles (empathy vs. tough love)
5. Multilingual Support
- Gemini supports 100+ languages natively
- Whisper handles multilingual transcription
- Challenge: Culturally appropriate coaching styles vary (direct vs. indirect communication)
Log in or sign up for Devpost to join the conversation.