Omni-Coach

Generating report after workout
While interacting
HomePage
Generated Report

Inspiration

Elite athletic coaching is a luxury restricted to professional athletes, while standard fitness apps often rely on "dumb" pose estimation—tracking (x, y) coordinates without understanding the nuance or intent of a movement. We were inspired to build Omni-Coach to bridge this gap, creating an AI that doesn't just see where your joints are, but understands the biomechanical risks and kinetic potential of your body in real-time. We wanted to move beyond basic tracking into the realm of "Kinesthetic Intelligence."

What it does

Omni-Coach is a high-performance biomechanical analysis tool that acts as both a real-time coach and a lead researcher:

Live Mode (Gemini 3 Flash): Analyzes video frames with sub-second latency to provide "Micro-Cues" (e.g., "Drive through heels") and immediate Safety Alerts if it detects dangerous forms like spinal rounding or joint collapse.
1. Deep Analysis (Gemini 3 Pro): Post-workout, it synthesizes the entire "Chrono-Log" of the session. Using high-level reasoning, it detects "Form Creep"—the exact moment fatigue caused biomechanical breakdown—and generates a PhD-level research report featuring internal and external coaching cues. ## How we built it The application is a high-frequency React 19 engine orchestrated by two distinct Gemini 3 models:
2. Gemini 3 Flash: Used for the latency-first vision loop. We stream JPEG frames into the model to perform raw spatial reasoning without the overhead of traditional pose-estimation libraries.
3. Gemini 3 Pro: Leveraged for the final "Kinesthetic Analysis Report." We utilize the Thinking Config with a dedicated token budget to allow the model to deliberate over the session history, identifying patterns of fatigue and recruitment efficiency.
4. The UI: Designed with a "Cyber-Researcher" aesthetic using Tailwind CSS, featuring scanning overlays, biometric data streams, and a dynamic feedback HUD. ## Challenges we ran into The primary challenge was maintaining context over a long workout set without overwhelming the model's window. We solved this by implementing a "Chrono-Log" system—a lightweight text-based history of every biomechanical event detected by Flash, which is then summarized by Pro. We also had to navigate browser-level "NotAllowedError" camera permission hurdles in secure contexts, requiring a robust manual permission trigger and retry logic. ## Accomplishments that we're proud of We successfully implemented Proprioceptive Reasoning. Most AI models struggle to explain how a movement should feel, but by leveraging Gemini 3’s multimodal depth, we were able to generate "Internal Cues" (e.g., "Imagine ripping the floor apart with your feet") that are standard in professional sports science but previously impossible for AI to generate accurately from visual data alone. ## What we learned We learned that Gemini 3 is exceptionally capable of understanding three-dimensional depth from two-dimensional video frames. By prompting the model as a "Lead Biomechanical Researcher," we observed a significant increase in the precision of the feedback, moving from generic advice to research-grounded corrections like "posterior chain engagement" and "lumbar spine stabilization." ## What's next for Omni-Coach The next phase for Omni-Coach is "Physio-Mode." We plan to adapt the core engine for elderly rehabilitation and physical therapy. By detecting early signs of gait instability or joint degradation, Omni-Coach can transition from a gym tool to a proactive healthcare solution that prevents falls and tracks recovery progress for patients at home.

Built With

gemini
react
tailwind
vite
webmediaapi

Submitted to

Gemini 3 Hackathon

Created by

1. Dual-Model Orchestration Strategy
I architected a high-performance system that splits computational tasks between two distinct Gemini 3 models to solve the "Latency vs. Depth" trade-off. I implemented Gemini 3 Flash for the real-time vision loop (processing frames every 1.5s for immediate feedback) and Gemini 3 Pro for the post-session "Thinking" phase, where it deliberates over the entire kinematic history.

2. Native Multimodal Vision Pipeline
Rather than relying on traditional coordinate-based pose estimation libraries (like MediaPipe), I built a native multimodal pipeline. This system captures raw video frames via the MediaDevices API, encodes them as JPEGs, and streams them directly to Gemini 3. This allows the model to perform "Proprioceptive Reasoning"—understanding the intent and safety of a movement based on visual context rather than just joint coordinates.

3. Kinesthetic Feedback Engine
I engineered a structured JSON feedback loop that categorizes biomechanical events into four states: Observing, Correction Needed, Perfect Form, and Safety Alert. This engine is grounded in professional athletic standards, distinguishing between Internal Cues (how a movement should feel) and External Cues (how a body should move), a nuance typically reserved for elite human coaches.

4. High-Performance Generative UI
I developed a "Cyber-Researcher" dashboard using React 19 and Tailwind CSS. The UI features a dynamic HUD with scanning overlays, a "Chrono-Log" session history, and a high-fidelity Modal for the final Biomechanical Research Report. The interface is optimized for high-frequency state updates to ensure the coach's verbal cues feel instantaneous to the user.

5. Fatigue & "Form Creep" Detection
I implemented a temporal analysis layer where Gemini 3 Pro utilizes its Thinking Config to identify "Form Creep." By comparing earlier reps to later ones in the session log, the system calculates a consistency rating and identifies the exact timestamp where fatigue-induced biomechanical breakdown occurred.

Rajendra shahi

Updates

Rajendra shahi started this project — Jan 17, 2026 10:44 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.