Inspiration
Most AI learning tools live on a screen and can teach sit-down-and-learn topics like math, science, and programming. But real physical skills like boxing, violin, piano, and sports happen in physical space. There is no accessible AI that can watch you move and coach you in real time. I wanted to build an AI mentor that sees what you are doing and gives immediate, movement-based feedback like a real coach.
What it does
CoachCam turns your camera into a live AI performance coach. It tracks your body or hands using computer vision, calculates posture and movement metrics, and gives real-time feedback through on-screen tips and voice guidance. Users can switch between skills like boxing, violin, and piano, and each skill has its own tracking logic, scoring system, and feedback rules. Every correction is based on detected movement, not generic advice.
How I built it
I built CoachCam using Next.js, TypeScript, and Tailwind for the interface. For tracking, I used MediaPipe Pose for full body skills and MediaPipe Hands for instrument skills. Each skill is implemented as a modular skill pack that defines what landmarks to track, what metrics to compute, how to score performance, and when to trigger feedback. All analysis runs client side for low latency, and voice feedback is generated in real time using the browser speech API.
Challenges I ran into
The biggest challenge was preventing fake or generic feedback. Early versions would trigger advice that did not truly match the movement. I redesigned the system so that every feedback message requires metric evidence and a minimum duration before it triggers. Another challenge was handling tracking instability. I added smoothing, confidence thresholds, and calibration steps to ensure the system only speaks when it has reliable data.
Accomplishments that I'm proud of
I am proud that every score and coaching cue is directly tied to real movement detection. Switching between boxing, violin, and piano actually changes the tracking logic, not just the label. The system feels responsive and credible because it only speaks when it detects measurable technique issues.
What I learned
I learned that real time AI systems need deterministic rules, not just language models. Trust comes from transparency and evidence. I also learned how important performance optimization and smoothing are when building computer vision apps in the browser.
What's next for CoachCam
Next, I want to expand to more sports and performance skills, improve biomechanical modeling, add session analytics and progress tracking, and explore more advanced feedback models trained on expert performance data. The long term goal is to make AI coaching accessible for any physical skill.
Built With
- framer-motion
- lovable
- mediapipe-tasks-vision
- radix-ui
- react
- react-18
- recharts
- shadcn/ui
- tailwind-css
- typescript
- vite
- web-speech-api