I've played tennis my whole life. If you've ever tried to improve your technique on your own, you know the
frustration — you think you're doing it right, but without someone watching, you're just guessing. Real
coaching is expensive, and video analysis tools are slow: record, upload, wait, get feedback hours later. By then the moment is gone.

When I saw that Gemini 2.5 Flash had a real-time Live API and Gemini 3 Flash could do deep image analysis, I thought: what if I combined them into something that actually watches you and talks to you while you practice?

What it does

Point your camera at yourself while practicing any sport. The AI coach sees your movement, listens to your questions, and gives spoken feedback in real time. A live skeleton overlay tracks your body so you can see what it's analyzing. Pick your sport, pick your skill level, and just start moving.

How we built it

Two Gemini models running together in a feedback loop:

  • Gemini 2.5 Flash Live API streams camera and audio in real time and talks back with voice coaching at sub-500ms latency

  • Gemini 3 Flash gets the same video frame every 5 seconds with a sport-specific prompt and returns detailed biomechanical analysis — joint angles, weight distribution, injury flags

  • Gemini 3's text output is injected back into the Live session via sendClientContent(), so the voice coach references the deeper insights naturally without interrupting the conversation

Neither model could do this alone. One is fast but shallow, the other is detailed but slow. Together they create coaching that's both responsive and technically deep.

Frontend is React + Vite with MediaPipe Pose running locally for the skeleton overlay. Backend is Fastify with WebSocket. Eight sports, three skill levels, each with tailored prompts.

Challenges we ran into

Feedback injection timing was the hardest — getting Gemini 3's analysis into the Live session without cutting off an in-progress voice response took experimentation. The bidirectional audio pipeline (16kHz in, 24kHz out) needed careful buffer management to keep latency feeling natural. Running MediaPipe Pose at 10 FPS on the same camera stream that feeds Gemini required decoupling the two pipelines so they don't fight over frames.

Accomplishments that we're proud of

The dual-model feedback loop actually works — and it feels like talking to a real coach. When Gemini 3 detects that your elbow is dropping on a tennis serve and that insight flows into the voice coaching seconds later, it's genuinely useful. We're also proud of supporting 8 sports with skill-level-aware prompts, the live skeleton overlay, and that the whole thing runs in a browser with no installs.

What we learned

Sport-specific prompts make a huge difference. "Analyze this movement" gives generic results. "Analyze racket position, grip, footwork, and body rotation" gives coaching that actually feels useful. We also learned that the pattern of combining a streaming model with a periodic deep-analysis model and feeding insights back is powerful and applicable way beyond sports.

What's next for LIVE Sport Coach

This is a short demo of something that could be genuinely useful with more development. Session recording with annotated highlights, progress tracking across sessions so you can see improvement over time, and a mobile-optimized version for on-field use. The foundation is there — it just needs polish to become a real tool for athletes who want to get better but don't always have a coach watching.

Built With

  • fastify
  • gemini-2.5-flash-live-api
  • gemini-3-flash
  • google-genai-sdk
  • mediapipe-pose
  • node.js
  • react
  • tailwind-css
  • vite
  • vitest
  • websocket
Share this project:

Updates