Inspiration

I have an insane fear of public speaking. Always have. The kind where your vision goes blurry, your voice disappears, and your brain forgets every word you ever knew. I've always imagined what it would feel like to have a private coach in my corner — someone who could actually see me, hear me, and tell me the truth. Not a YouTube video. Not a book. A real feedback loop.

The problem is that private coaches are expensive. Access is unequal. And most tools only work with text. I wanted something that could help me stumble through a presentation and turn that into something I'm proud of. An AI agent that sees and hears the full picture — the affordable, always-available version of the coach I never had.

What it does

Ate the Mic is a live AI agent that watches you speak — literally. It uses simultaneous video and audio input to analyze not just what you say, but how you say it. Pacing. Filler words. Eye contact. Body language. Vocal confidence.

The agent speaks back to you in real-time using a conversational voice — it's a two-way coaching loop, not a dashboard of metrics. You also get live feedback overlays on your video feed and a post-session transcript of every insight. Whether you're prepping for a job interview, a wedding toast, a boardroom pitch, or just trying to stop saying "um" every five seconds — this coach meets you where you are and helps you level up.

How we built it

Built solo on the Gemini Live API (gemini-2.5-flash-native-audio-preview), leveraging its native multimodal capability to process live video and audio streams simultaneously. The frontend is React + TypeScript + Vite. The backend is a lightweight Express server that handles token-based auth and API proxying.

The agent is designed around a coaching framework — it doesn't just flag problems, it guides you through them. Session structure, feedback loops, and encouragement are all baked into the prompt architecture. No external speech analyzers. No bolt-on vision models. Gemini Live does the heavy lifting natively.

Access is controlled via an admin panel — invite-only via email and token — so the demo environment stays clean and intentional.

Challenges we ran into

Getting the agent to feel like a coach and not a critic was harder than expected. Early versions read like a performance review — cold, clinical, discouraging. Tuning the tone to be honest but human took a lot of iteration.

Latency in the live feedback loop was also a challenge — too fast, and it interrupts your flow; too slow, and the moment is gone. Finding that timing sweet spot was its own puzzle.

Managing state across a live multimodal session introduced subtle bugs — video frames capturing stale session state via closures, audio contexts persisting after disconnect. The kind of issues you only find by actually running sessions on yourself.

And honestly — demoing a public speaking tool is public speaking. Dogfooding this one hit different.

Accomplishments we're proud of

Shipping a working multimodal live agent as a solo entry. That alone felt like a win.

But the real one: it actually helps. Ran sessions on myself. Watched the feedback change how I was standing, how I was breathing, and where I was looking. The loop works. That's the thing I'm most proud of — it's not just a demo, it's a tool I'd actually use.

What we learned

Multimodal is a different kind of product design problem. You're not designing for text input — you're designing for a human in front of a camera, probably nervous, probably self-conscious. The UX has to account for that emotional state. The agent isn't just processing signals; it's holding space.

Building with a live bidirectional API also means rethinking how you manage state. Closures that work fine in a static app become bugs in a streaming session. Refs over state. Cleanup matters.

I also learned that the fear doesn't go away. But with the right feedback, you stop being paralyzed by it. That shift — from dread to agency — that's the whole product.

What's next for Ate the Mic

  • Structured programs — multi-session arcs for specific goals: interviews, TEDx-style talks, sales pitches, difficult conversations
  • Progress tracking — session-over-session improvement metrics so you can see the arc, not just the snapshot
  • Scenario mode — drop into simulated high-stakes environments: hostile Q&A, panel interviews, impromptu speaking drills
  • Community layer — optional shared replays and peer coaching for accountability
  • Reach — getting this in front of people who need it most: students, first-gen professionals, anyone who's ever gone silent in a room when they had something worth saying

https://ate-the-mic-test-316301290609.us-west1.run.app/

Built With

Share this project:

Updates