KinetiQ: Elite AI Sports Biomechanics & Live Coaching

Inspiration

Athletes across Reddit communities (r/tennis, r/golf, r/skiing, r/basketball) constantly post videos asking "Is my form correct?" Human feedback is inconsistent, delayed, or subjective. With Gemini's multimodal models, I built a tool that provides instant, professional-level biomechanical feedback—giving every athlete access to elite-level analysis.

What it does

KinetiQ transforms a smartphone into a professional biomechanics lab with three core capabilities:

1. Video Analysis (Gemini 3 Pro)

  • Analyzes 8+ sports with action-specific feedback (Tennis Forehand, Basketball Shooting, Golf Drive, etc.)
  • Returns structured data: overall score, 6-part body scoring (Head/Shoulders/Arms/Hips/Legs/Footwork), 4-6 temporal markers
  • Technical feedback format: [PASS] for elite technique, [FIX] with corrective cues

2. AI Vision Correction

  • Auto-generates annotated infographics at key timestamps
  • Gemini 3 Pro extracts X/Y coordinates → Canvas API renders overlays
  • Color-coded: GREEN arrows for correct biomechanics, RED for errors
  • Smart positioning with 2-3 word diagnostic tags

3. Live Coach Mode (Gemini 2.5 Flash Native Audio)

  • Real-time WebSocket streaming: 2 FPS video + 16kHz PCM audio
  • AI "Silent Observer" mode—only speaks after detecting action completion
  • Sub-6-word instant corrections with natural voice

How we built it

Tech Stack

  • React 19 + TypeScript, Vite, Recharts (radar charts), Tailwind CSS
  • @google/genai SDK with Gemini 3 Pro (video analysis) and 2.5 Flash (live streaming)

Video Analysis Pipeline

  1. Convert video to Base64 → send to Gemini 3 Pro with structured JSON schema
  2. Extract frames at timestamps using HTML5 Video API
  3. Send frames to Gemini 3 Pro for coordinate extraction (X/Y%, label, side, status)
  4. Canvas renders annotations with arrows, text boxes, color coding

Live Coach Pipeline

  1. Dual AudioContext: input (48kHz native) + output (24kHz playback)
  2. Connect via ai.live.connect WebSocket
  3. ScriptProcessorNode with 3x GainNode boost → resample to 16kHz PCM → stream
  4. Capture video at 2 FPS (480x360 JPEG) → stream to API
  5. Decode 24kHz PCM audio → queue with AudioBufferSourceNode for playback

Key Technical Details

  • Strict JSON schema with Type.INTEGER/Type.BOOLEAN for consistent output
  • Boundary-aware label positioning (x<30: right, x>70: left)
  • Biomechanical feedback protocol: max 15 words, [PASS] vs [FIX] format
  • Promise-chaining for WebSocket race condition handling

Challenges we faced

  • Real-time Audio Sync: Manual PCM resampling (48kHz→16kHz input, 24kHz output) + AudioContext state management to avoid race conditions
  • Coordinate Precision: Structured schema enforcement + boundary rules to prevent off-screen/overlapping labels
  • Video Frame Timing: Promise-wrapped seeked event handling for exact timestamp frame extraction
  • Bandwidth vs Precision: Reduced to 2 FPS + 480x360 JPEG (0.5 quality) for manageable WebSocket load

What we learned

  • Gemini 3 Pro understands physics: Unlike pose libraries, it identifies rotational energy transfer, timing issues, weight distribution—no custom training needed
  • Structured schemas eliminate parsing: Type-enforced JSON (Type.INTEGER, Type.BOOLEAN) > regex/text extraction
  • Visual > text: Color-coded infographics communicate errors instantly
  • Live API requires bidirectional thinking: Callback architecture forced promise-chaining instead of direct session references

What's next for KinetiQ

  • Pro-comparison ghosting: Overlay professional athlete skeletons for visual form comparison
  • Historical trend analysis: Multi-session progress tracking with long-term development plans
  • Community leaderboards: Competitive "Biometric Score" rankings for technical drills
  • Wearable integration: Combine video with IMU sensor data for multi-modal assessment
Share this project:

Updates

posted an update

We are hard at work refining KinetiQ's core engine. Here is a breakdown of the latest technical upgrades we are working on to make your AI coach even better:

  • Frame-Level Analysis: In sports, a lot happens in a single second. We are shifting from second-level timestamps to frame-level analysis. This allows us to capture high-speed movements with much greater precision, ensuring we identify the exact moment of key action phases.

  • Refined API Pipeline: We have restructured our workflow to strictly separate technical analysis from instructional image generation. By decoupling these processes, we prevent the AI from making "fake assumptions" or hallucinations in the visuals, ensuring the skeletal overlays and advice remain professional and biomechanically accurate.

  • Improved Consistency: We are tuning the temperature settings of our API calls. This fixes the variance in responses, ensuring that the coaching feedback you receive is reliable, stable, and consistent every time you upload.

Log in or sign up for Devpost to join the conversation.