KinetiQ: Elite AI Sports Biomechanics & Live Coaching

Inspiration

Athletes across Reddit communities (r/tennis, r/golf, r/skiing, r/basketball) constantly post videos asking "Is my form correct?" Human feedback is inconsistent, delayed, or subjective. With Gemini's multimodal models, I built a tool that provides instant, professional-level biomechanical feedback—giving every athlete access to elite-level analysis.

What it does

KinetiQ transforms a smartphone into a professional biomechanics lab with three core capabilities:

1. Video Analysis (Gemini 3 Pro)

Analyzes 8+ sports with action-specific feedback (Tennis Forehand, Basketball Shooting, Golf Drive, etc.)
Returns structured data: overall score, 6-part body scoring (Head/Shoulders/Arms/Hips/Legs/Footwork), 4-6 temporal markers
Technical feedback format: [PASS] for elite technique, [FIX] with corrective cues

2. AI Vision Correction

Auto-generates annotated infographics at key timestamps
Gemini 3 Pro extracts X/Y coordinates → Canvas API renders overlays
Color-coded: GREEN arrows for correct biomechanics, RED for errors
Smart positioning with 2-3 word diagnostic tags

3. Live Coach Mode (Gemini 2.5 Flash Native Audio)

Real-time WebSocket streaming: 2 FPS video + 16kHz PCM audio
AI "Silent Observer" mode—only speaks after detecting action completion
Sub-6-word instant corrections with natural voice

How we built it

Tech Stack

React 19 + TypeScript, Vite, Recharts (radar charts), Tailwind CSS
@google/genai SDK with Gemini 3 Pro (video analysis) and 2.5 Flash (live streaming)

Video Analysis Pipeline

Convert video to Base64 → send to Gemini 3 Pro with structured JSON schema
Extract frames at timestamps using HTML5 Video API
Send frames to Gemini 3 Pro for coordinate extraction (X/Y%, label, side, status)
Canvas renders annotations with arrows, text boxes, color coding

Live Coach Pipeline

Dual AudioContext: input (48kHz native) + output (24kHz playback)
Connect via ai.live.connect WebSocket
ScriptProcessorNode with 3x GainNode boost → resample to 16kHz PCM → stream
Capture video at 2 FPS (480x360 JPEG) → stream to API
Decode 24kHz PCM audio → queue with AudioBufferSourceNode for playback

Key Technical Details

Strict JSON schema with Type.INTEGER/Type.BOOLEAN for consistent output
Boundary-aware label positioning (x<30: right, x>70: left)
Biomechanical feedback protocol: max 15 words, [PASS] vs [FIX] format
Promise-chaining for WebSocket race condition handling

Challenges we faced

Real-time Audio Sync: Manual PCM resampling (48kHz→16kHz input, 24kHz output) + AudioContext state management to avoid race conditions
Coordinate Precision: Structured schema enforcement + boundary rules to prevent off-screen/overlapping labels
Video Frame Timing: Promise-wrapped seeked event handling for exact timestamp frame extraction
Bandwidth vs Precision: Reduced to 2 FPS + 480x360 JPEG (0.5 quality) for manageable WebSocket load

What we learned

Gemini 3 Pro understands physics: Unlike pose libraries, it identifies rotational energy transfer, timing issues, weight distribution—no custom training needed
Structured schemas eliminate parsing: Type-enforced JSON (Type.INTEGER, Type.BOOLEAN) > regex/text extraction
Visual > text: Color-coded infographics communicate errors instantly
Live API requires bidirectional thinking: Callback architecture forced promise-chaining instead of direct session references

What's next for KinetiQ

Pro-comparison ghosting: Overlay professional athlete skeletons for visual form comparison
Historical trend analysis: Multi-session progress tracking with long-term development plans
Community leaderboards: Competitive "Biometric Score" rankings for technical drills
Wearable integration: Combine video with IMU sensor data for multi-modal assessment

Built With

Updates

Sam Huang started this project — Feb 06, 2026 11:50 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.

Yingze Hou posted an update — Jan 18, 2026 02:47 PM EST

We are hard at work refining KinetiQ's core engine. Here is a breakdown of the latest technical upgrades we are working on to make your AI coach even better:

Frame-Level Analysis: In sports, a lot happens in a single second. We are shifting from second-level timestamps to frame-level analysis. This allows us to capture high-speed movements with much greater precision, ensuring we identify the exact moment of key action phases.
Refined API Pipeline: We have restructured our workflow to strictly separate technical analysis from instructional image generation. By decoupling these processes, we prevent the AI from making "fake assumptions" or hallucinations in the visuals, ensuring the skeletal overlays and advice remain professional and biomechanically accurate.
Improved Consistency: We are tuning the temperature settings of our API calls. This fixes the variance in responses, ensuring that the coaching feedback you receive is reliable, stable, and consistent every time you upload.

Log in or sign up for Devpost to join the conversation.