🚀 Inspiration
High-stakes communication like job interviews, investor pitches, or even thesis defenses is notoriously difficult to prepare for.
🔹 Traditional methods like reading notes, rehearsing alone, or asking friends for feedback often fail because:
- They don’t simulate real pressure or dynamic questioning.
- Feedback is subjective, shallow, or inconsistent.
- There’s no way to measure progress over time.
😰 As a result, many candidates walk into these moments underprepared, deliver weak responses, and miss career-changing opportunities.
We were inspired to build Sozo Pitch Helper, a tool that acts as a personal, AI-powered sparring partner. It creates a realistic practice environment where users don’t just rehearse, they get objective, data-driven coaching that compounds over time.
🤖 What it does
Sozo Pitch Helper is an AI training platform that transforms preparation into a science.
- 📄 Context Ingestion: Users upload a job description, pitch deck, or research paper.
- 🧠 Contextual Awareness: The backend uses Gemini to classify the task (interview, pitch, or defense), extract key points, and generate a short description. This context is prepared so the AI panel asks questions directly relevant to the user’s scenario.
- 🎙️ Dynamic Simulation: Users engage in a real-time session with a multi-voice AI panel (Eric, Daniel, and Rachel). Each persona represents a distinct style of questioning, adapting to answers and pressing for detail, just like a real hiring manager, venture capitalist, or examiner would.
- 📊 Data-Driven Feedback: After each session, users receive a detailed performance analysis scored on four metrics:
- 🗣️ Communication: Clarity, confidence, and conciseness of speech. Because how you say something is as important as what you say.
- 🧠 Content Mastery: Subject knowledge and the ability to logically support claims with evidence — proof that you truly know your material under pressure.
- 🤝 Engagement & Delivery: Tone, pacing, and audience awareness — measuring if you persuade rather than just recite facts.
- 💪 Resilience: Composure when challenged, tracking how well you think clearly under pressure. Often the deciding factor in high-stakes settings.
- 🗣️ Communication: Clarity, confidence, and conciseness of speech. Because how you say something is as important as what you say.
✅ The result is not just practice, it’s targeted, intelligent coaching that builds confidence, clarity, and measurable improvement.
🛠️ How we built it
We chose a modern, decoupled architecture for security and scalability.
- Frontend: A responsive SPA built with React, Vite, and Tailwind CSS.
- Backend: A stateless API server in Python (Flask) hosted on Hugging Face Spaces.
- Database & Auth: Firebase Authentication for secure sign-ins and Firebase Realtime Database for user data, credits, and history.
- Conversational AI: Powered by the ElevenLabs Agents API, enabling natural multi-voice conversations with Eric, Daniel, and Rachel.
- Analytical AI: The backend uses the Google Gemini API for summarization, task classification, feedback scoring, and memory.
🏗️Architecture
🤯 Challenges we ran into
- Fair AI Scoring: Early prompts gave perfect scores for trivial answers, then became too strict. We solved this with a "Grader on a Curve" rubric that rewards effort while keeping feedback accurate.
- User Identification: The AI confused names in role-play (like “Eric” or “Daniel”) with the actual user. Explicitly passing the user’s name into the prompt fixed this.
- Gemini SDK Integration: Incorrect assumptions about the
google-genailibrary caused repeated crashes. The fix was to strictly follow official documentation.
🎙️ Conversational AI Integration
- Getting AI agents to respond naturally in real-time was harder than expected.
- Handling interruptions, context-switching, and smooth back-and-forth required careful orchestration.
- Multi-voice turn system prompt in Elevenlabs.
🌐 Browser & Mic Permissions
- Different browsers handle microphone access in slightly different ways.
- Ensuring a smooth, one-click setup without scaring users with security popups was a delicate balance.
⏹️ Session Termination & Control
- Users needed the ability to end sessions instantly.
- Managing cleanup of active streams, freeing resources, and properly logging transcripts was more complex than anticipated.
⚙️ Scalability & Tracking
- Getting the scoring right took a lot of iteration to get the appropriate “Grader on a curve” system.
- Every call session required credit tracking, transcripts, and performance logging in Firebase.
- Keeping this accurate while minimizing server load introduced tricky edge cases.
Despite these hurdles, each challenge shaped the product into something more robust and user-friendly, ensuring a smoother, more realistic experience for users preparing for their big moments. 🚀
🏆 Accomplishments that we're proud of
- The AI Memory Engine: Our breakthrough feature. Before each session, Gemini analyzes the user’s past performance, identifies weaknesses, and generates a one-sentence directive for the AI panel. This ensures continuity and coaching that actively targets weaknesses , if you struggled with financial projections last time, you’ll be pressed on them again.
- Multi-Step AI Orchestration: One AI model (Gemini) powers a full pipeline of tasks: document summarization, scenario classification, dynamic AI panel briefing, and performance analysis.
- Building a Secure & Modern Stack: Delivered a full, production-ready application with real-time data, authentication, and robust AI integrations.
🧠 What we learned
- Prompt Engineering is Iterative: Scoring and coaching quality improved only through multiple cycles of testing and refinement.
- Documentation is King: Our toughest bugs were solved by carefully revisiting official SDK documentation.
- Decoupling is Power: Keeping Gemini on the backend means we can upgrade prompts, scoring logic, and the memory engine without touching the frontend.
🔮 What's next for Sozo Pitch Helper
- Contextual Research Feature: Use the project’s short description to fetch web context, enriching AI panel knowledge.
- Visual Progress Tracking: Build a dashboard to graph performance across the four metrics over time.
- Custom AI Video Personas: Let users choose interviewer styles (e.g., Friendly & Encouraging vs Skeptical & Direct) to broaden practice scenarios.

Log in or sign up for Devpost to join the conversation.