Pose2Dance

Inspiration

If you've ever thought, "I wish I could do this cool {dance, trick, move}," so did we. If you've every though, "I could find a video tutorial for this," so did we. But at the end of the day, if you're thinking, "Man, am I bad at this," then congratulations, so did we! Learning movement through video can be so confusing: the reference video feels like lightning speed, you can't see what you're doing, and you're too busy figuring out your arms to pay attention to your feet. Our goal for Pose2Dance is to improve learning through video by providing real time analysis of your movements in comparison to any reference video, so you can focus on grooving, and we'll do the rest!

What it does

Pose2Dance is a Python app that tracks your real-time movement through a webcam and uses machine learning to estimate your position relative to your reference video. In our app, we display both videos simultaneously to make for easy reference and we compute your move score from 1-100 as you go. Pick any YouTube video, input its link, and you're ready to go!

How we built it

For this project, we leveraged TensorFlow's MoveNet.Lightning, which is a lightweight, quantized model for pose estimation, capable of real-time inference on mobile devices. MoveNet returns the positions of 17 different body features, such as shoulders, wrists, and knees. We use this positional data along with its velocity vectors, to compute a scoring cost, with weights tuned by our own reference dance moves. Our app is built with the tkinter framework, with video streaming through OpenCV.

Challenges we ran into

One of the main challenges we encountered was getting the video stream from YouTube and syncing with the webcam data to allow for computation of our move scores. We needed to both be able to play the requested videos and get the image data of the current frame, which are not easily accomplished through the YouTube API. We also ran into dependency issues, ranging from difficulty installing TensorFlow on Apple Silicon to network problems in OpenCV. Another challenge was developing a scoring function for the quality of users' dances. Determining a suitable algorithm as well as proper weights and scalings turned out to be a nontrivial task requiring a large amount of trial and error.

Accomplishments that we're proud of

One technical accomplishment was getting the pose estimation model to process the feeds of two video streams in real time without impacting the user's experience of the videos or the responsiveness of the app. We made use of multiprocessing so that the computationally heavy workloads could be offloaded onto a different process, allowing our system to analyze and score at a rate of over 10 frames per second per stream.

What we learned

This project required us to learn to use a variety of unfamiliar technologies and frameworks in a limited time. For tkinter especially, it was a challenge to learn to use it quickly due to the unconventional behavior compared to usual Python and the limited documentation online, but in the end we were able to make a fun, functional project while picking up several new skills along the way.

What's next for Pose2Dance

There are still ways that we want to improve the user experience with Pose2Dance. We would like to have a more refined UI with some additional features, like a way to seek timestamps in the reference video for sections the user struggled or did well with. We also could always make adjustments to our move score algorithm to tune it to be a better metric, for example accounting for reaction time and averaging over a window for less volatility. We could also offer suggestions on specific things to work on, like arm placement. But overall, we hope that Pose2Dance can continue on to be a fresh and engaging way for people to learn and dance!