Inspiration
I still remember my first weeks in the gym. I'd load the bar, drop into a squat, and have no idea if I was doing it right. My knees caved in, my chest collapsed, my heels lifted and nobody was there to tell me. When I started talking about it with my friends, the same story kept coming back. And the data backs it up: bad form builds injuries, not muscle. The honest fix is a coach. But a good personal trainer costs more than most of us can afford, and they're not around for the 6 a.m. That gap between "I want to train hard" and "I have no idea if I'm training safely" should be addressed. ExerciseRight was born from that gap. We wanted to put a patient, always-on form coach in everyone's gym bag.
What it does
Every gym lift is dominant in one of two planes frontal or sagittal. We picked the plane most lifters live in, and the single most-butchered lift inside it: the Classic Back Squat. ExerciseRight watches you from the side and coaches you, rep by rep, in real time: 1.) See the problem. A live side-view stick-figure mannequin in the Quest HUD mirrors your body. The joints turn into red dots on the part you're breaking like shoulder for a forward lean, knee for knees past toes, heel for a heel lift and a yellow directional arrow shows the exact fix: up from the shoulders, back toward the hips, down into the floor. 2.) Hear the fix. A TTS cue fires the same instant "Chest up", "Hips back", "Heels down", "Go deeper". 3.) Track the work. A translucent HUD counts reps and sets and logs every rep's depth and issues for post-session review. No wearables. Just a camera and a coach in your headset.
The Architecture
Good squat coaching needs two views: a side profile for depth, lean, and knee tracking (sagittal cues), and a front profile for stance symmetry and knee valgus (frontal cues). We designed the pipeline so a camera can feed it from anywhere. Today a Rubik Pi 3 + Brio webcam captures the side from across the room. Tomorrow when you stand in front of a gym mirror the Quest 3's own passthrough cameras can pick up the front directly and the side through the reflection, migrating the entire stack onboard. The system is a pipeline from pixels to coaching:
- Capture: OpenCV pulls 640×480 @ 30 fps off the Brio on the Pi.
- See: Google's MediaPipe Pose Landmarker (Tasks API, CPU delegate) extracts 33 body landmarks per frame.
- Stream: an
asyncioWebSocket server broadcasts them over Wi-Fi as versioned JSON, with a per-client "latest-wins" queue so a slow consumer can never stall the camera. - Think: on the Quest, Unity 2022.3 (C#) runs a state-machine rep counter with hysteresis gates and a biomechanics layer that turns landmarks into sport-science metrics: knee and hip angles, torso lean, knee-over-toe ratio, heel lift, depth. Every cue is explainable, no black box.
- Show & Speak: corrections land on three surfaces in the same frame: a translucent HUD (TextMeshPro), a custom side-view mannequin drawing bones, red joints on the problem area, and yellow correction arrows to the fix, plus an Android TTS voice coach on the Quest's speakers with per-issue cooldowns. Glue: Oculus XR Plugin for passthrough, a one-click Unity Editor script that builds the scene and auto-wires every reference, and in-app IP entry.
Challenges we ran into
Defining what "good" looks like. The hardest problem wasn't the computer vision it was the biomechanics. A squat is a continuous motion, not a pose, and "correct" shifts with mobility, limb proportions, and bar position. We had to translate coaching language into math: a forward lean became the angle between shoulder-hip and vertical; knees forward became the horizontal ratio of knee to toe normalized by foot length. Every threshold was tuned by reading squat literature, and filming ourselves. Streaming data between two devices. Early builds tried to ship raw video from the Pi to the Quest, and a waste of the Quest's compute. We split the pipeline: the Pi does the seeing, the Quest does the thinking, and we only send the ~5 KB of landmarks per frame over a versioned JSON WebSocket. Two orders of magnitude less bandwidth, and each device does what it's best at.
Accomplishments that we're proud of
We built a real-time XR squat coach that closes the full loop camera pixels to voice cue and red joint on the mannequin. Every correction is grounded in explainable biomechanics (torso lean, knee-over-toe ratio, heel lift, depth), not a black-box model, so the coach can always tell you exactly why it flagged a rep. We turned a naive "stream video" design into a lean "stream landmarks" one.
What's next for ExerciseRight
Expand to other exercises and cover both planes. The squat is the hardest-to-coach, highest-impact sagittal lift but the same framework extends naturally to deadlifts, lunges, and RDLs, and the front view unlocks frontal-plane lifts like the overhead press, bench, and rows. One analyzer module per lift, same pipeline.
Go fully onboard. The architecture is already designed for this: step in front of a gym mirror and the Quest's passthrough cameras can capture both views, front directly, side through the reflection. The Pi disappears, the stack collapses into the headset, and setup drops to "put on the glasses and start lifting."
Session tracking and fatigue awareness. Every rep is already scored; logging them gives a timeline of depth consistency, form drift as a set progresses, and long-term trends. Paired with a fatigue detector that watches form degrade in real time, the coach can call a set before the rep that gets you hurt.
Beyond the gym. The same explainable-biomechanics engine is exactly what post-injury rehab and physiotherapy need quantified, repeatable, patient-in-the-loop feedback without a clinician in the room. A coach for lifters today, a recovery tool tomorrow.
Log in or sign up for Devpost to join the conversation.