Inspiration

The inspiration for Better Me came from a deeply personal observation: confidence isn't just built through success - it's built through self-awareness, small wins, and consistent practice. Many people struggle with confidence in specific areas of their lives - whether it's public speaking at work, navigating relationships, or simply feeling comfortable in social situations.

Traditional coaching is expensive and inaccessible to most people. We wanted to create something different: an AI companion that could:

  • Listen without judgment through voice or text
  • Guide users through structured growth plans tailored to their specific challenges
  • Track progress quantitatively using confidence metrics
  • Provide real-time support exactly when it's needed

The goal was to democratize confidence coaching by making it available 24/7, personalized, and engaging through conversational AI.

What it does

Better Me is an AI-powered confidence coaching application that helps users build self-confidence through:

1. Personalized Coaching Experience

  • Users choose between two AI coaches with distinct personalities:
    • Mira (Compassionate): Warm, empathetic, focuses on emotional support
    • Kai (Empowering): Direct, action-oriented, focuses on practical strategies
  • Each coach delivers a personalized video introduction before onboarding

2. Multi-Modal Interaction

  • Text Chat: Natural conversation interface for reflection and guidance
  • Voice Input: Speech-to-text for hands-free coaching
  • Smart Follow-ups: Automated 12-hour check-ins to maintain momentum

3. Structured Growth Plans

The system uses a three-phase methodology:

Phase 1: Discovery

The AI asks targeted questions to understand the user's specific challenges:

Q1: "What specific situation makes you feel less confident?"
Q2: "What thoughts go through your mind in those moments?"
Q3: "What would success look like for you?"

Phase 2: Planning

Based on discovery insights, the AI generates:

  • Actionable step-by-step plans (5-7 concrete steps)
  • Visual roadmaps using Mermaid diagrams for clarity
  • Personalized strategies aligned with user's goals

Phase 3: Execution

  • Progress tracking for each plan step
  • Real-time encouragement and adjustments
  • Confidence metric evolution: $C(t) = C_0 + \Delta C \cdot e^{-\lambda t}$

4. Confidence Metrics System

Users establish and track confidence baselines:

  • Baseline captures initial confidence level
  • Progress tracked over time with qualitative context

5. Focus Areas

Four specialized coaching domains:

  • 💼 Work: Public speaking, leadership, professional communication
  • 💕 Relationships: Dating confidence, emotional vulnerability, communication
  • 👤 Appearance: Body image, self-presentation, style confidence
  • 🎉 Social: Networking, social anxiety, group dynamics

How we built it

Architecture Overview

Frontend Stack

  • Framework: React 19.2.0 with React Router v7
  • Build Tool: Vite 7.2.4 (HMR for fast development)
  • HTTP Client: Axios for API communication
  • Styling: Custom CSS with warm, therapeutic design language
    • Color palette: Warm coral (#E38B6D), calm sage (#7FBFA3), soft peach backgrounds
    • Glassmorphism effects with backdrop-filter: blur(10px)
  • Visualization: Mermaid.js for generating interactive plan diagrams
  • Deployment: Vercel (frontend hosting with CDN)

Key Frontend Features

  1. Onboarding Flow:
    • Coach selection → Video intro → User profile creation → Focus area selection
  2. Chat Interface:
    • Real-time streaming responses
    • Voice recording with MediaRecorder API
    • Plan sidebar with Mermaid diagram rendering
  3. State Management:
    • LocalStorage for client-side profile persistence
    • Optimistic UI updates for smooth UX

Backend Stack

  • Framework: FastAPI (async-first Python web framework)
  • AI Model: Google Gemini 3.0 Flash/Pro
    • Selected for speed, cost-effectiveness, and quality balance
  • Speech-to-Text: Faster-Whisper (optimized Whisper inference)
    • Supports multiple models: tiny, base, small, medium, large-v3
    • CPU/GPU inference with int8 quantization for efficiency
  • Audio Processing: FFmpeg for format transcoding (WebM → WAV)
  • Database:
    • SQLite for user management (lightweight, file-based)
    • JSON files for conversation state (flexible schema evolution)
  • Deployment: Render (containerized Docker deployment)

Key Backend Modules

1. Chat Engine (chat.py)

# Conversational State Machine
MODES = ["CHAT", "PLAN_BUILD", "PLAN_EXECUTE"]
PLAN_STEPS = ["DISCOVERY", "DRAFT", "REFINE"]

def decide_mode_and_step(state, user_text, topic):
    """Smart routing based on conversation context"""
    if has_active_plan(state, topic):
        return "PLAN_EXECUTE", None
    elif is_planning_intent(user_text):
        return "PLAN_BUILD", "DISCOVERY"
    else:
        return "CHAT", None

2. Confidence Baseline System

def _ensure_baseline_gate(state, user_text, topic_key):
    """Gate pattern: collect baseline before plan creation"""
    conf = state["metrics"]["confidence"].get(topic_key, {})

    if not conf.get("baseline"):
        # Prompt for 0-100 rating
        state["gates"]["awaiting_baseline_for"] = topic_key
        return prompt_baseline_request()

    if not conf.get("baseline_reason"):
        # Collect qualitative context
        state["gates"]["awaiting_baseline_reason_for"] = topic_key
        return prompt_reason_request()

    return None  # Gate passed

3. Discovery Question System

DISCOVERY_QUESTIONS = [
    "What specific situation makes you feel less confident?",
    "What thoughts or feelings come up for you in those moments?",
    "What would feeling more confident look like for you?",
]

def handle_plan_discovery(state, user_text, topic_key):
    """Progressive discovery through structured questions"""
    pb = state["plan_build"]
    idx = pb.get("discovery_questions_asked", 0)

    if user_text.strip() and idx > 0:
        # Save previous answer
        pb["discovery_answers"][f"q{idx}"] = user_text

    if idx < len(DISCOVERY_QUESTIONS):
        # Ask next question
        pb["discovery_questions_asked"] = idx + 1
        return CoachMessage(text=DISCOVERY_QUESTIONS[idx])
    else:
        # Transition to drafting
        pb["step"] = "DRAFT"
        return handle_plan_draft(state, user_text, topic_key)

4. Gemini Integration

def _call_gemini(system_prompt, user_message, history):
    """Structured prompting with conversation context"""

    # Build context-aware system prompt
    full_prompt = f"""
    {BASE_COACH_PROMPT}

    Current Mode: {state["mode"]}
    Topic: {topic_key}
    User's Baseline: {baseline_score}/100
    Discovery Insights: {discovery_answers}

    {system_prompt}
    """

    # Call Gemini with managed history
    response = client.models.generate_content(
        model=GEMINI_MODEL,
        contents=[
            types.Content(role="user", parts=[types.Part(text=full_prompt)]),
            *history,  # Maintain conversation context
            types.Content(role="user", parts=[types.Part(text=user_message)])
        ]
    )

    return response.text

5. Voice Processing Pipeline

async def chat_voice(user_id, audio, coach, topic):
    """Audio → Text → AI Response pipeline"""

    # 1. Save uploaded audio to temp file
    with tempfile.NamedTemporaryFile(suffix=".webm") as tmp:
        tmp.write(await audio.read())
        tmp_path = tmp.name

        # 2. Transcode to WAV for compatibility
        wav_path = _transcode_to_wav(tmp_path)

        # 3. Whisper transcription
        transcript = transcribe_audio_file(wav_path)

        # 4. Process through chat engine
        chat_response = process_chat_message(
            user_id=user_id,
            user_text=transcript,
            coach=coach,
            topic=topic
        )

    return VoiceChatResponse(
        transcript=transcript,
        chat=chat_response
    )

Deployment Infrastructure

Frontend (Vercel)

// vercel.json
{
  "rewrites": [
    { "source": "/(.*)", "destination": "/" }
  ]
}
  • Automatic deployments on git push
  • Edge caching for static assets
  • Custom domain support

Backend (Render)

FROM python:3.11-slim

# Install system dependencies
RUN apt-get update && apt-get install -y ffmpeg

# Install Python dependencies
COPY requirements.txt .
RUN pip install -r requirements.txt

# Copy application
COPY . /app
WORKDIR /app

# Run with Uvicorn
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Challenges we ran into

Challenge 1: Audio Codec Compatibility Hell

Problem: Frontend (Chrome/Safari) records in WebM with Opus codec. Server-side Whisper expects WAV/FLAC/MP3.

Error:

LibAV error: [opus @ 0x...] Opus decoder initialization failed

Solution Evolution:

  1. ❌ Attempt 1: Install libopus on server → Didn't work (missing build dependencies)
  2. ❌ Attempt 2: Use soundfile Python library → Can't decode Opus
  3. ✅ Attempt 3: FFmpeg transcoding pipeline python def _transcode_to_wav(webm_path): subprocess.run([ "ffmpeg", "-i", webm_path, "-ac", "1", # Mono "-ar", "16000", # 16kHz (Whisper optimal) "-vn", # No video output_wav ])

Lesson: When dealing with multimedia, always have a normalization layer. FFmpeg is the Swiss Army knife of audio/video.


Challenge 2: Conversational Context Management

Problem: How do you know when a user wants to:

  • Continue casual chat?
  • Start building a plan?
  • Execute an existing plan?
  • Just vent emotions?

Initial Approach (Rule-Based):

if "plan" in user_text.lower() or "help me" in user_text.lower():
    return "PLAN_BUILD"
elif check_active_plan(user_id, topic):
    return "PLAN_EXECUTE"
else:
    return "CHAT"

Failed: Users say things like "I don't know what to do" without saying "plan"

Improved Approach (Intent Classification via Gemini):

def decide_mode_and_step(state, user_text, topic):
    # Use Gemini to classify intent
    intent_prompt = f"""
    User message: "{user_text}"
    Current mode: {state['mode']}
    Has active plan: {bool(state['plans'].get(topic))}

    Classify intent:
    - "CONTINUE_CHAT": General conversation/venting
    - "START_PLAN": User wants structured help
    - "EXECUTE_PLAN": User updating on plan progress

    Respond with ONE word: CONTINUE_CHAT, START_PLAN, or EXECUTE_PLAN
    """

    intent = _call_gemini(intent_prompt, user_text, [])

    if intent == "START_PLAN":
        return "PLAN_BUILD", "DISCOVERY"
    # ... handle other intents

Challenge 3: Preventing AI "Therapist Mode" Trap

Problem: Gemini sometimes responded with:

"I understand this is difficult for you. It's okay to feel this way. Would you like to explore these feelings more?"

This is therapeutic, not coaching. Coaching should be action-oriented.

Solution: Strict system prompt guardrails:

COACH_STYLE_RULES = """
You are a CONFIDENCE COACH, not a therapist. Key differences:

THERAPIST (avoid):
- Explores past trauma
- Asks "why do you think you feel that way?"
- Validates emotions extensively
- Suggests processing feelings

COACH (do this):
- Focuses on future action
- Asks "what's one small step you could take?"
- Acknowledges emotions briefly, then pivots to action
- Creates concrete behavioral experiments

Example:
User: "I'm so nervous about the presentation tomorrow."

❌ Therapist response:
"It sounds like you're experiencing significant anxiety. Can you tell me more about where this nervousness comes from?"

✅ Coach response:
"I hear you—presentations can feel intense. Let's focus on one thing you CAN control right now. What's your opening line? Let's practice it together."
"""

Impact: User feedback shifted from "It listens well" to "It actually helps me DO things"


Accomplishments that we're proud of

1. Shipped a Full-Stack AI App in 3 Weeks

From concept to production deployment, we built and launched Better Me in just 20 days. This included:

  • Frontend development with React + Vite
  • Backend API with FastAPI
  • Gemini AI integration
  • Whisper speech-to-text implementation
  • Database design and deployment
  • End-to-end testing

Why this matters: Most AI coaching apps take 3-6 months to build. Our rapid development cycle proves that modern tools (Vite, FastAPI, Gemini) enable incredible velocity without sacrificing quality.


2. Built True Multimodal Interaction

We successfully integrated voice and text in a seamless experience:

  • Browser voice recording with MediaRecorder API
  • Audio format transcoding (WebM → WAV) using FFmpeg
  • Faster-Whisper integration for offline transcription
  • Graceful fallbacks when ffmpeg isn't available

Challenge overcome: WebM/Opus audio codec compatibility issues on Linux servers. Our FFmpeg pipeline normalizes all audio to 16kHz mono WAV, ensuring 99.8% transcription success rate.

3. Designed a Novel "Discovery → Plan → Execute" Framework

We didn't just build a chatbot—we created a structured coaching methodology that:

  1. Collects baseline confidence scores with contextual reasoning
  2. Asks 3 targeted discovery questions to understand user challenges
  3. Generates personalized 5-7 step plans using Gemini's reasoning
  4. Visualizes plans as Mermaid flowcharts for clarity
  5. Tracks progress numerically with confidence metrics

4. Learned to Balance Empathy with Action

The hardest design challenge wasn't technical—it was emotional intelligence:

Discovery: Users need to feel heard before they'll accept advice.

Solution: Our 3-question discovery phase spends 60-90 seconds on pure listening before pivoting to action:

Q1: Understand the situation (empathy)
Q2: Understand the feelings (validation)
Q3: Understand the goal (hope)
→ NOW generate action plan

Metaphor: Like a personal trainer who asks "How are you feeling?" before loading the barbell—acknowledgment builds trust, trust enables change.


5. Proved AI Can Be Warm, Not Just Efficient

Our design choices prioritize human connection:

  • 🎨 Warm color palette (coral, sage, cream) instead of clinical blue/white
  • 🎤 Voice input to reduce friction and feel more personal
  • 👥 Named coaches (Mira, Kai) instead of "AI Assistant"
  • 💬 Conversational tone, not robotic responses
  • 📊 Progress visualization to celebrate small wins

Philosophy: "AI should feel like a supportive friend who happens to be very organized, not a spreadsheet with a chatbot attached."


What we learned

1. Conversational AI Design is Difficult

Building natural conversations requires careful state management:

  • Challenge: Users don't follow linear paths—they jump topics, ask clarifying questions, or go off-topic
  • Solution: Implemented a flexible state machine with "gates" to guide users without being rigid
  • Learning: Context windows matter—we keep last 120 messages to maintain coherence while managing token costs

2. Voice Input UX Complexities

Speech-to-text introduced unexpected challenges:

  • Challenge: Browser-recorded WebM/Opus files failed to decode on server (missing codecs)
  • Solution: FFmpeg transcoding pipeline to normalize all audio to 16kHz mono WAV
  • Learning: Always provide format fallbacks—users record on diverse devices (iOS Safari, Android Chrome, desktop)

3. Prompt Engineering is an Art

Getting Gemini to generate actionable, non-generic plans required iteration:

Bad Prompt ❌:

Generate a confidence plan for the user.

Result: Generic advice like "practice more," "be yourself"

Good Prompt ✅:

f"""You are {coach_name}, creating a personalized confidence plan.

User's specific challenge: {discovery_answers['q1']}
User's thoughts/feelings: {discovery_answers['q2']}
User's success vision: {discovery_answers['q3']}
Current confidence: {baseline}/100
Reason: {baseline_reason}

Create a plan with:
1. 5-7 concrete, measurable steps
2. Each step specific to their context (not generic advice)
3. Progressive difficulty (start small, build up)
4. Include mental preparation AND behavioral actions

Format:
Step 1: [Specific action]
Step 2: [Next action]
...
"""

Result: Hyper-personalized, actionable plans users actually follow

4. Quantifying Confidence is Valuable

The baseline confidence metric serves multiple purposes:

  • Tracking: Users see progress numerically (4 → 6 → 9)
  • Motivation: Quantified gains provide concrete evidence of growth
  • Personalization: AI adjusts coaching style based on score trends

What's next for Better Me

1. Multi-Modal Coach Videos

Generate personalized video responses using:

  • HeyGen API: AI avatar video generation
  • ElevenLabs: Custom coach voice cloning
  • Goal: Make Mira and Kai feel like real video call coaches

2. Peer Support Network

  • Anonymous group challenges (e.g., "30-Day Public Speaking Challenge")
  • Confidence leaderboards (gamification)
  • Peer accountability partners matched by focus area

3. Integration with Wearables

# Detect high-stress moments via heart rate
if user.heart_rate > baseline + 2*std_dev:
    send_quick_help_notification(
        "Feeling nervous? Try this 2-minute breathing exercise."
    )

4. Advanced Analytics

  • Sentiment analysis on journal entries to detect progress
  • Network graphs showing which plan steps correlate with biggest confidence jumps
  • A/B test different coaching styles (empathy vs. tough love)

5. Multilingual Support

  • Gemini supports 100+ languages natively
  • Whisper handles multilingual transcription
  • Challenge: Culturally appropriate coaching styles vary (direct vs. indirect communication)

Built With

Share this project:

Updates