CapyCoach

CapyCoach
Problem Statement
Live Form Detection & Voice Feedback
Progressive Tracking
FEED CAPY!!

Why CapyCoach?

The hard part of weightlifting isn't finishing the rep, but knowing whether you're doing them right. Bad form is the leading cause of gym injuries, and most fitness apps can't catch it. They can help you count reps and sets, but they can't tell you if your squat is too shallow, your back is leaning too far forward, or you're speeding through your sets—and they certainly don't warn you when you're about to get hurt.

While personal trainers can catch these mistakes in real time, they are expensive and unavailable for on-demand help for late-night workouts at home. CapyCoach is here to close the gap; we create an accessible, on-demand, expert-level form coaching experience that catches mistakes before they lead to injuries while keeping the workout engaging, something you'll want to come back to tomorrow.

What CapyCoach does

CapyCoach is an iPhone app that watches you work out through your camera and gives you the coaching that a personal trainer would, but in real time, with its own voice, and for free.

For each rep, our analysis pipeline detects three classes of form faults:

Depth — did the user actually go low enough? is the range-of-motion achieved?
Posture — was the torso position correct, or was the user leaning too far forward?
Tempo — did the user control the descent, or did they rush down and bounce out of the bottom?

When you finish a rep, an AI coach powered by Claude gives you a one-sentence feedback: "Your squat was too shallow; drop a few inches lower next time." The feedback is specific, actionable, and adapts based on what you actually did.

The capy

Working out is hard to stick with. So, we developed the coaching in a gamification layer: every successful workout earns capy food. Feed your capybara enough, and it levels up, becoming progressively buffer until it ascends to its mysterious final form...

How we built it

The core pipeline runs on a Flask backend that receives video frames from the Flutter mobile app. We use MediaPipe Pose Landmarker to extract 33 body keypoints per frame. Then our analyzer:

Process a state machine (for example, standing vs squatting, up vs down) to detect rep boundaries
Buffers the landmark trajectory for each rep
Determines geometric features at the deepest frame: knee angle for depth, hip-shoulder-knee for torso lean, descent and ascent timing for tempo
Send structured rep data to Claude, which generates short voice feedback
Speaks the feedback through ElevenLabs TTS on the phone

The Flutter frontend handles auth (Firebase), workout selection, the capybara progression system, session summaries, and stats history.

What we learned

Designing form-detection features that actually work is harder than it looks. We initially tried to detect knee valgus and hip asymmetry, but realized those faults are fundamentally invisible from a side-view camera. We cut those features and kept only the ones the camera angle accurately supports.
The state machine threshold for "did a rep happen" should be separate from the form-correctness threshold. Otherwise shallow reps either don't count or wrongly count as correct.
Real-time vision models on edge devices is a hard problem. We explored running 3D human mesh reconstruction (HMR2.0, SAM 3D Body) for a 3D replay feature, but the inference compute requirements make on-device deployment impractical right now. This is a future-work item.

Challenges

MediaPipe landmark noise. A single noisy frame can show the user’s knee at 65° when they are actually at 85°. We solved this by smoothing over a 5-frame window around the deepest point.
Tempo measurement. Our first version captured only the bottom slice of each rep, so descent times were always 0.0s. We decoupled the buffer window (which captures the full motion) from the state machine (which detects rep boundaries).
AI coach latency. Claude API calls take 1-2 seconds, which would freeze the camera display. We threaded the AI calls so the UI stays smooth.

Built with

Python, Flask, MediaPipe, Claude API, ElevenLabs API, Flutter, Dart, Firebase, OpenCV, NumPy, python-dotenv

Sponsor / AI tools used

Claude; we used the Claude API for coaching feedback generation, and Claude Code as a development assistant throughout the build.

Did you implement a generative AI model or API

Yes. We use the Claude API (claude-sonnet-4-6) to generate per-rep coaching feedback, and ElevenLabs for text-to-speech voice playback. MediaPipe Pose Landmarker handles real-time pose estimation but is not generative.