Computer Vision Underwater Kick Coach

Inspiration

When I first began varsity swimming in ninth grade, I rapidly improved and even swam at my county's championship. But, after swimming a personal best there and briefly passing out in the process, I plateaued. I strength trained during the offseason in order to improve, but I soon realized that technique, not strength, was my true bottleneck.

While millions of swimmers rely on expensive private coaching, millions of others —especially high school swimmers — simply can't afford this objective feedback. This lack of feedback not only limits performance but can also lead to injury in some cases. As of now, existing technique coaching methods use people or expensive motion systems. But, after seeing a YouTube video of someone analyzing dance technique with human pose estimation (https://www.youtube.com/watch?v=bPDoMdn71h0), I decided that I was going to create a third, affordable alternative.

What it does

Pipeline Overview:
    1. Input: videos of learner and expert swimmer
    2. Pose Estimation: find coordinates of swimmers' key joints
    3. Feature extraction: Calculate angles between arms, torso, thigh, and calf
    4. Kick Segmentation: Partition video into individual kick cycles
    5. Cycle Synchronization: Align learner and expert kick lists for frame-by-frame comparison
    6. Output: Generate output videos, metrics, and textual feedback

This algorithm first starts by intaking an underwater kick video from both a learning swimmer and an expert swimmer. In every frame of their videos, it uses a pose estimation algorithm to derive the coordinates of both swimmers' key joints. Then, it calculates three key angles that define underwater kick technique: the angle between the arms and torso, the angle between the torso and thigh, and the angle between the thigh and calf. All of these points and angles are stored in Python lists, and an algorithm creates "kick-lists" that contain the swimmers' points and angles during the frames of each individual kick. Then, the learner's kick-lists are manipulated so that they match the exact lengths of the expert's kick-lists. This allows the algorithm to take the difference between their angles throughout their kicks, and this information quantifies how the learner's technique differs from the expert. Finally, it creates the original learner video with annotations of the angle differences, a video showing a side-by-side comparison of the learner and expert, and a video showing the differences during key moments of the learner's kick. Additionally, the algorithm creates a csv file with the angle differences at the key points of their kicks and creates a report with textual feedback based on biomechanics. This feedback transforms raw pose estimation data into objective, biomechanical feedback that a swimmer can act upon in training.

How I built it

This project was built in Python and takes advantage of two major libraries: ViTPose and OpenCV. ViTPose contains the pose estimation model I used to derive the key points of the swimmers and OpenCV allowed me to iterate through each frame of the video, make annotations on them, and create separate output videos. Aside from those two libraries, I mostly used Python logic to construct this algorithm. This system was also designed with modularity, meaning that individual components (e.g. pose estimation model and segmentation rules) can be altered and improved upon independently.

Role of AI in this System:
- AI: ViTPose derives the 2D coordinates of swimmers in the image
- Programming Logic: Finds angles between coordinates, matches up kick cycles, compares angles, generates feedback

Challenges I ran into

Pose Estimation Model Dataset Bias
These models are intended for use above water and because of that, they are trained heavily on this kind of data. This makes most of them fail with underwater videos because they all have a blue filter that they aren't trained for. I tried using VFX to eliminate these blue filters, but this method isn't a panacea for every single underwater video and didn't prove effective either way. So, I instead used a very heavy vision image transformer (ViTPose-Plus-Huge). It was able to reliably detect the key joints of swimmers, but this triumph over dataset bias came at the cost of efficiency. In order to have a reasonable run time with this algorithm, I had to leverage my brother's computer with his NVidia graphics card.
Quantifying Body Motion
Initially, the dancer's video (Inspiration section) inspired me to calculate the slopes between the key points. But, I soon realized that a change in the roll of a camera could heavily bias these measurements. So, I instead decided to calculate the angles between their joints because they remain the same with varying camera rolls.
Calculating Reflexive Angles
In linear algebra, people conventionally calculate the angle between two vectors using the definition of the dot product. But, this method only captures the shortest angle between any two vectors. I needed to be able to calculate reflex angles for when certain joints flexed beyond 180 degrees. So, I used a instead used a custom geometric method with vector orientation to reliably calculate any joint angle.
Synchronizing Kick Cycles

A. Creating Lists for Every Kick
In every video, a swimmer starts kicking at a unique time with a unique speed. So, to control these factors, I treated the angle between a person's thigh and calf for every frame as a sort of sinusoidal function, one that has maximums for when the leg is flexed out and minimums for when the leg is fully pulled inward. The minimums are where I partitioned the video into distinct lists, one for each individual kick.

B. Matching Kick-Lists in Length
In order to do a frame-by-frame comparison, I needed to ensure the kick-lists between the learner and expert were equal in length. Initially, I thought of using a MoviePy to match two generated MP4 clips in length. But, this function is intended for much longer videos, and it failed to reliably make the frame-level differences I needed. So, I instead made an algorithm that iterated over the kick-lists themselves. It first takes the difference between the lengths of a learner's and expert's list, d. Then, it evenly divides the learner's kick-list into d partitions, and manipulates the frame at these partition points. If the learner's kick-list is shorter than the expert's kick-list, it duplicates the frame at each partition point. But, if the learner's kick-list is longer than the expert's kick list, it removes the frame at each partition point. This will ultimately create kick-lists for the learner that are equal in length to the expert's kick-lists, allowing for frame-by-frame comparison for each kick. These challenges illustrate how in real-world AI systems, handling data-variability is a key factor behind success.

Edge Cases Handled:
    - Frames with no pose estimation model output
    - Different video resolutions and frame rates
    - Videos where swimmers come from the opposite direction
    - Different swimmer kick rates, start times, and total kicks

Accomplishments that I'm proud of

One thing that I am extremely proud of is that I was able to make this algorithm extremely flexible. I made it able to adapt to different frame rates, different video resolutions, different directions the swimmer is coming from, and all the variations that a swimmer could possibly have in their kick. All of these features allow this algorithm to simply require a side-view of a swimmer to give them real, objective feedback that can identify subtle technique flaws that are difficult to see even with expert coaching. This is the kind of feedback I always wanted, and with this algorithm, and I was finally able to see a major flaw with my underwater kick: my shoulders weren't flexible enough.

What makes this unique:
    - Successfully uses underwater pose estimation (most models fail here due to dataset bias)
    - Aligns kick cycles between any two swimmers to allow for frame-by-frame comparison
    - Transforms pose estimation data into both visual feedback and textual feedback based on biomechanics
    - Provides an automated alternative to expensive human coaching

User Experience:
    - User uploads two videos (learner and expert)
    - Algorithm generates
        1. Learner video annotated with angle differences
        2. Annotated learner video analyzing highest and lowest parts of their kicks
        3. Side-by-side comparison video of learner and expert
        4. Textual biomechanical feedback based on highest and lowest parts of their kicks
    - No technical knowledge from user is required

What I learned

Through this project, I learned that human pose estimation is an underrated yet very impactful field in the realm of Artificial Intelligence. It has the capability to help people in various ways, and I only merely delved into one of them. I also learned that when creating a system that has a purpose in the real world, there are many points of failure that people wouldn't think of initially. While I had a plan for what I intended to do, I learned that the design process is iterative, and as seen in the challenges section, I've had to change my perspective of this problem multiple times throughout this design process. However, most importantly, I learned the true importance of efficiency in real-world systems. Prior to this project, I would always attack coding problems in inefficient ways and never suffered the consequences of this practice. But, this time around, this approach didn't just cause me to wait a few extra milliseconds for an output, it caused me to wait several more minutes. So, in order to finish this project and make it applicable for a future mobile app, I had to minimize this algorithm's use of OpenCV and especially ViTPose. Now, when designing future systems, I will truly factor in efficiency into my process.

What's next for Underwater Kick Coach

This project is only an introduction to the field of teaching biomechanics with pose estimation, and I have many opportunities to expand it further. This project could be laterally expanded into other sports such as volleyball, track, and tennis. Also, I could potentially use this technology to provide an affordable solution for those who need physical therapy but can't afford it. In addition, I could vertically expand this project into a full-scale app that anyone who can't afford a private technique coach could use. I optimized this algorithm for mobile development for the most part, but one bottleneck that still remains is the pose estimation model I am using. As stated in the challenges section, I had to sacrifice model efficiency for accuracy when dealing with underwater video. But, I hope to create a pose estimation model that can get the best of both world in the future. The opportunities this project provides are limitless, and in the future, this project could evolve into a scalable, AI-driven athletic training platform.

Built With

Updates

Ivan Reznikov started this project — Apr 05, 2026 11:43 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.