KinetiQ: Elite AI Sports Biomechanics & Live Coaching
Inspiration
Athletes across Reddit communities (r/tennis, r/golf, r/skiing, r/basketball) constantly post videos asking "Is my form correct?" Human feedback is inconsistent, delayed, or subjective. With Gemini's multimodal models, I built a tool that provides instant, professional-level biomechanical feedback—giving every athlete access to elite-level analysis.
What it does
KinetiQ transforms a smartphone into a professional biomechanics lab with three core capabilities:
1. Video Analysis (Gemini 3 Pro)
- Analyzes 8+ sports with action-specific feedback (Tennis Forehand, Basketball Shooting, Golf Drive, etc.)
- Returns structured data: overall score, 6-part body scoring (Head/Shoulders/Arms/Hips/Legs/Footwork), 4-6 temporal markers
- Technical feedback format:
[PASS]for elite technique,[FIX]with corrective cues
2. AI Vision Correction
- Auto-generates annotated infographics at key timestamps
- Gemini 3 Pro extracts X/Y coordinates → Canvas API renders overlays
- Color-coded: GREEN arrows for correct biomechanics, RED for errors
- Smart positioning with 2-3 word diagnostic tags
3. Live Coach Mode (Gemini 2.5 Flash Native Audio)
- Real-time WebSocket streaming: 2 FPS video + 16kHz PCM audio
- AI "Silent Observer" mode—only speaks after detecting action completion
- Sub-6-word instant corrections with natural voice
How we built it
Tech Stack
- React 19 + TypeScript, Vite, Recharts (radar charts), Tailwind CSS
@google/genaiSDK with Gemini 3 Pro (video analysis) and 2.5 Flash (live streaming)
Video Analysis Pipeline
- Convert video to Base64 → send to Gemini 3 Pro with structured JSON schema
- Extract frames at timestamps using HTML5 Video API
- Send frames to Gemini 3 Pro for coordinate extraction (X/Y%, label, side, status)
- Canvas renders annotations with arrows, text boxes, color coding
Live Coach Pipeline
- Dual AudioContext: input (48kHz native) + output (24kHz playback)
- Connect via
ai.live.connectWebSocket - ScriptProcessorNode with 3x GainNode boost → resample to 16kHz PCM → stream
- Capture video at 2 FPS (480x360 JPEG) → stream to API
- Decode 24kHz PCM audio → queue with AudioBufferSourceNode for playback
Key Technical Details
- Strict JSON schema with Type.INTEGER/Type.BOOLEAN for consistent output
- Boundary-aware label positioning (x<30: right, x>70: left)
- Biomechanical feedback protocol: max 15 words,
[PASS]vs[FIX]format - Promise-chaining for WebSocket race condition handling
Challenges we faced
- Real-time Audio Sync: Manual PCM resampling (48kHz→16kHz input, 24kHz output) + AudioContext state management to avoid race conditions
- Coordinate Precision: Structured schema enforcement + boundary rules to prevent off-screen/overlapping labels
- Video Frame Timing: Promise-wrapped
seekedevent handling for exact timestamp frame extraction - Bandwidth vs Precision: Reduced to 2 FPS + 480x360 JPEG (0.5 quality) for manageable WebSocket load
What we learned
- Gemini 3 Pro understands physics: Unlike pose libraries, it identifies rotational energy transfer, timing issues, weight distribution—no custom training needed
- Structured schemas eliminate parsing: Type-enforced JSON (Type.INTEGER, Type.BOOLEAN) > regex/text extraction
- Visual > text: Color-coded infographics communicate errors instantly
- Live API requires bidirectional thinking: Callback architecture forced promise-chaining instead of direct session references
What's next for KinetiQ
- Pro-comparison ghosting: Overlay professional athlete skeletons for visual form comparison
- Historical trend analysis: Multi-session progress tracking with long-term development plans
- Community leaderboards: Competitive "Biometric Score" rankings for technical drills
- Wearable integration: Combine video with IMU sensor data for multi-modal assessment
Log in or sign up for Devpost to join the conversation.