The Inspiration
When you walk into the gym, your goal is to finish your workout, but some people don’t realize that, instead of gaining strength and stability, they’re actually just setting themselves up for a future visit at a physio. And...this was exactly the case for some of us on this team. That is the reason why we created GymPigeon.
The Project
GymPigeon provides real-time feedback by analyzing a lifter's form as they move. The app tracks joint alignment and posture throughout every rep and provides audio cues right after the repetition. By alerting users about their specific errors, such as improper posture or cutting reps short, GymPigeon ensures that every rep is performed safely and efficiently. The app prioritizes long-term safety by teaching optimal movement patterns. By correcting form before weight is increased, GymPigeon reduces the risk of injury and strain. Most gym injuries don’t happen because people lift heavy; they happen because of bad form. Hence, GymPigeon is the bridge between technology, safety and fitness.
The Process
We prioritized simplicity and efficiency to minimize latency. For computer vision, we used Python, OpenCV, and MediaPipe to track specific body landmarks in real-time. We implemented custom geometry logic to compute joint angles (hip-knee-ankle, shoulder-elbow-wrist) to detect exercise phases (eccentric/concentric). For the backend, we built a FastAPI server that processes the video feed frame-by-frame. The server draws the skeleton and feedback overlays directly onto the frame using OpenCV. For real-time streaming, the processed video frames are encoded into Base64 strings and streamed to the React frontend via WebSockets. We integrated the ElevenLabs text-to-speech API. When a form error is detected, the backend generates a natural voice command to guide the user.
The Challenges
One of the trickiest visual challenges was figuring out the image processing pipeline. Our goal was to display the video feed from Python OpenCV in our web app. We considered having the video instance running on the frontend. We also considered using WebRTC for efficient real-time video transfer, but the setup would have been time-consuming. In the end, we ultimately settled on using WebSockets to create a persistent communication channel to transfer the encoded video data to the frontend with minimal setup.
Further, we had an audio blocking issue. Initially, when our AI spoke to give feedback, the entire video feed would freeze for 2 seconds while the audio generated. We realized the TTS API call was blocking the main thread. We solved this by implementing Python threading to handle audio generation in the background, keeping the video smooth at 30 FPS.
Tracking the "state" of a rep (descending vs. ascending vs. bottom hold) was also challenging. It required complex logic. A simple angle check wasn't enough; we had to implement a state machine that remembers the previous frame to accurately count a rep and detect if a user "cheated" the range of motion.
The Accomplishments
Successfully learnt and implemented multiple API we never used, such as ElevensLabs, built a working full-stack app with FastAPI, WebSockets and asynchronous coding, and explored the field of computer vision with OpenCV, MediaPipe.
Lift Off
GymPigeon will not stop here. While our main goal is to offer live correction for beginners, our app will grow into a mobile-friendly personal trainer and gym buddy altogether. We will collect data on the repeated mistakes our program detects to provide personalized training, as well as warnings before users start the exercise. Furthermore, add on features, such as a variety of trainer personalities and live spotting alarms, are the next ideas to implement.
Built With
- elevenlabs
- fastapi
- mediapipe
- opencv
- python
- react
- threading
- typescript
- websockets
Log in or sign up for Devpost to join the conversation.