Inspiration

Most speaking tools are either generic, text-only, or post-session. We wanted a live coach that can see, hear, and intervene in real time the way a strong human coach would, but with measurable, objective signals.

What it does

PitchMirror is a real-time multimodal speaking coach. It listens to microphone audio, watches webcam frames, optionally analyzes screen share and uploaded slide decks, and interrupts with concise feedback on fillers, pace, eye contact, clarity, and slide quality. After each session, it generates a scorecard, coaching report, evidence-backed tips, and visual improvement assets.

How we built it

We built a FastAPI backend on Cloud Run with a binary WebSocket media pipeline for low latency. The live coaching agent runs on Google ADK with Gemini Live (gemini-2.5-flash-native-audio-preview-12-2025) and session-bound tools for grounded interventions. We use Firestore for session persistence, Secret Manager for API key handling, and Terraform for reproducible infrastructure. The frontend uses Web Audio + camera/screen capture APIs for real-time streaming and playback.

Challenges we ran into

The hardest issues were low-latency duplex audio reliability, duplicate transcript chunks, and preventing double coach replies in the same turn. We also had to harden for dependency/model compatibility changes, manage slide/PDF processing without blocking the event loop, and keep long sessions stable under API/time limits.

Accomplishments that we're proud of

We shipped a production-grade live agent flow with measurable interruption logic, cooldown safeguards, and grounded tool calls. We added screen-aware and slide-aware coaching, plus a multi-agent post-session pipeline for deep analysis. We also implemented secure Cloud deployment, rate limits, and reproducible IaC.

What we learned

Prompting alone is not enough for reliable live agents; server-side control gates are essential. State design and event boundaries matter as much as model choice in multimodal systems. Strong observability and failure handling are critical for demo-ready AI products.

What's next for PitchMirror

We plan to add richer live interleaved output, longitudinal progress tracking across sessions, multilingual coaching, and deeper slide redesign workflows. We also want stronger personalization by scenario (interviews, demos, talks) and better analytics for measurable improvement over time.

Built With

Share this project:

Updates