Shatter Glass — Face Your Truth
Built for the #GeminiLiveAgentChallenge by Daivik Karbhari (Developer) & Pratima Karbhari (Product & Marketing)
Inspiration
Pitching is one of the highest-stakes moments in anyone's life — and we face versions of it every single day. Whether it's a VC meeting, a job interview, or a board presentation, how you communicate can make or break the outcome.
Yet the traditional approach to preparation hasn't changed: practice in front of a mirror, run it by a friend, or memorize a script you'll forget the moment nerves kick in. None of that prepares you for what a real investor, hiring manager, or executive will actually do — interrupt you, challenge your numbers, and decide in the first thirty seconds whether you're worth their time.
There was a clear gap: nobody had built a coach that could simulate the worst-case scenario. Not a polite feedback tool. A brutally honest one that stops you mid-sentence and makes you fix it in the moment. That's why we built Shatter Glass.
What It Does
Shatter Glass is a real-time AI communication coach with three modes — Startup Pitch, Job Interview, and Presentation. The moment a session begins, the coach is live: watching you through the camera, listening through the mic, and ready to interrupt the second something goes wrong.
It doesn't wait for you to finish. If you use filler words, make a vague claim, or break eye contact, the AI barges in mid-sentence — just like a real high-stakes audience would expose you. It tells you exactly what went wrong and forces you to retry on the spot.
At the end of every session, you get a full performance report with an overall score, category breakdowns, and three specific action items drawn from what actually happened in your session.
Features
1. Real-time barge-in coaching
- Interrupts mid-sentence when it detects filler words ("um," "basically," "you know"), vague value props, or rambling
- Varied interruption style — "Hold on—", "Wait, wait, wait—", "No, no, no—" — never monotonous
- Forces an immediate retry: "Take a breath and give me that in one sentence. Go."
- Barge-in counter tracks exactly how many times the coach had to step in
2. Live video analysis
- Reads the camera feed at 0.5 FPS — enough to catch sustained patterns like slouching or avoiding eye contact
- Only interrupts for repeated negative body language, not a single glance away
3. Hybrid fact-checking
- Fast internal checks for obvious gaps (claiming no competitors in a well-known space)
- Live Google Search grounding for specific metrics — if your TAM is wrong, it catches it in real time with a citation
4. Screen share mode (Presentation)
- Alternates between camera and screen to evaluate both your delivery and your slides
- Calls out dense slides, reading verbatim, and weak closes
5. Session report card
- Overall score (1–10) with a one-sentence verdict
- Category scores: Content, Delivery, Body Language, Structure, Confidence
- Three actionable next steps tied to what happened in your session
How We Built It
The core of Shatter Glass is a bidirectional WebSocket pipeline built on the Gemini Live API. Audio is captured from the browser as raw 16kHz PCM Float32 via the Web Audio API and streamed continuously — no REST calls, no round-trip latency. This is what makes real-time barge-in possible.
Video is captured from the user's camera using the HTML Canvas API, throttled to 0.5 FPS (one frame every 2 seconds). That's enough spatial context for posture and eye contact analysis without flooding the connection.
For fact-checking, we built a hybrid approach: the model uses its internal knowledge for fast, obvious checks, and silently triggers Google Search Grounding for specific numerical claims. If a number doesn't hold up, the coach interrupts with the real data and its source.
The frontend is built with Vite and Vanilla JS, featuring a glass morphism UI and an audio-reactive fluid visualizer that changes color based on who controls the floor — teal when the user speaks, deep blue when the AI speaks, and red when a barge-in occurs.
The stack is containerized in a multi-stage Docker build and deployed on Google Cloud Run with session affinity, keeping each WebSocket pinned to the same node for the full session. CI/CD runs through GitLab with OIDC Workload Identity Federation — no long-lived service account keys anywhere in the pipeline.
| Layer | Technology |
|---|---|
| Frontend | Vite + Vanilla JS + CSS |
| Backend | Node.js 20 + Express 4 + WebSockets |
| AI (Live) | Gemini Live API — bidirectional streaming, Orus voice |
| Grounding | Google Search (real-time fact verification) |
| Audio | Web Audio API — 16kHz PCM Float32 |
| Video | HTML Canvas API — Base64 JPEG @ 0.5 FPS |
| Deployment | Multi-stage Docker + Google Cloud Run (session affinity) |
| CI/CD | GitLab + Workload Identity Federation |
Challenges We Ran Into
The Gemini Live API documentation is genuinely excellent, which made the core implementation smoother than expected. A few things still required careful tuning:
- Getting barge-in to feel natural — not just cutting audio, but making the interruption flow conversationally — required tuning the
serverContentinterruption flag and the voice persona prompts together. - Vision rate-limiting took iteration. The first version flagged every micro-expression, making the coach feel paranoid. We added a sustained-pattern filter so it only intervenes after a repeated negative signal.
- WebSocket session affinity on Cloud Run wasn't obvious at first. Standard load balancing would drop long-running connections. The
--session-affinityflag solved it cleanly.
Accomplishments We're Proud Of
- Built a genuine real-time barge-in loop — the AI interrupts mid-sentence, gives actionable feedback, and forces an immediate retry. That correction loop is what separates this from any existing feedback tool.
- The hybrid fact-checking system works in real time. Watching the AI catch a false market size claim and respond with a live citation mid-pitch is the strongest demo moment we have.
- The audio-reactive visualizer gives users instant visual feedback on who controls the floor — a small detail that makes the whole experience feel intentional.
- Zero long-lived credentials anywhere in the deployment pipeline. The GitLab + WIF setup is production-grade security for a hackathon project.
- Three fully distinct modes — Pitch, Interview, Presentation — each with its own evaluation framework, persona options, and interruption logic. This is a complete product, not a demo.
What We Learned
The biggest insight from building this is how much power a live agent has when it can interact with humans in real time — watching, listening, and responding simultaneously. Most AI tools still operate in a request-response loop. A live agent that can interrupt and adapt changes what's actually possible.
We also learned that effective coaching comes down almost entirely to the feedback loop — not just identifying a problem, but forcing the person to fix it immediately. That's the difference between a critique and actual skill development.
On the technical side, we gained a deep appreciation for how much the WebSocket architecture matters. REST-based approaches introduce just enough latency to completely break the illusion of a live interaction. Building on a true bidirectional streaming layer is non-negotiable for this kind of product.
What's Next for Shatter Glass
The scope here is larger than a hackathon. Every professional — founders, candidates, executives, sales reps, consultants — pitches, presents, or interviews regularly. That's a real, recurring need with a clear willingness to pay.
Our plan is to ship to early users, iterate hard on coaching quality and report depth, and build toward a product people pay for because it measurably improves their outcomes. The AI-native architecture means it gets better over time as the underlying models improve — without rebuilding the core.
We'd love to bring this to investors who understand why communication skill compounds over a career. The #GeminiLiveAgentChallenge feels like exactly the right place to start that conversation.
Shatter Glass — because the only way to get better is to face your truth.
Built With
- gemini-flash
- gemini-live-api
- google-cloud-run
- vertex-ai
Log in or sign up for Devpost to join the conversation.