Dancely

Dancely Logo
File Input UI
Brainstorming
Dancely UI
System Design / Process

💡 Inspiration

Our project was inspired by the art of dance and being able to be resourceful without access to a studio or instructor. Professional dance teams typically have access to large studios, instructor feedback, and ceiling to floor mirrors. With a lack of equipment and feedback, it can be harder for soloists to grow as fast which is what we aim to change with our application's precise feedback and interactive interface. We also saw the wider applications of a pose to pose comparison app that could help with things beyond dance such as gym form, sports, and general posture.

💎 What it does

Takes input in the form of mp4 videos, one designated as dance teacher and the other as student. The videos are processed through Google's Pose Landmark Detection library (MediaPipe Pose is the legacy version) to give estimations of both the teacher and student's key landmark points (e.g. left_shoulder, right_hip). Then the videos are synced using a dynamic time warping algorithm to accurately compare poses and indicate differentiating positions for areas of improvement. The user is then given both videos with landmark annotations side by side and a timeline below that hyperlinks to the correct timestamps. Each timestamp has a card of feedback for improvement and both videos can be replayed for easy review.

🛠 How we built it

We split our team into frontend (React w/ Tailwind) and backend (Flask, Algorithm & AI implementation). The Frontend focused on streaming the mp4 video backend provided and creating an easy file upload and interactive timeline bar for a seamless user experience. Backend focused on first identifying libraries or models that would give accurate pose estimations while also remaining lightweight for the video processing we would be doing. That's where we landed on MediaPipe Pose which we tracked to being a legacy library to Pose Landmark Estimations (PLE) on the Google Developer's page (along with OpenCV). We used PLE/OpenCV to generate our side-by-side output (specifying drawing 12 of the 33 points for clarity) using custom functions to find starting frame and differing frames from the Dynamic Time Warping (DTW) output at process time. We considered other forms of estimation such as Euclidian Distance calculation, simple vector calculations, etc to find the best for our purposes.

😰 Challenges we ran into

Our team is comprised of all beginners in AI which posed a challenge when trying to understand documentation and consider many ideas while not knowing the tools or typical behaviors. Here are some of the challenges we overcame!

Understanding OpenCV and how it integrates with Pose Landmark Detection (previously MediaPipe Pose) concerning reading images, drawing relevant landmarks, and labelling
Determining how we could sync two videos, pick a start frame to compare, determine a threshold of difference, reflect these in our feedback and output side-by-side video.
Understanding how to implement Dynamic Time Warping for our use case to find significant differences between teacher and student frames while not manually synced
Use Gemini Flash to give feedback on specific frames given by to avoid generalized feedback and predictable/nonspecific intervals

⭐️ Accomplishments that we're proud of

Since our team was full of beginners, it's hard to choose between all of the things we managed to get done during this challenge, but here are some of our bests ones!

Use Dynamic Time Warping to find significant differences between the teacher video and student video and return representative frame indices and avoiding requiring manual syncing from user

🧠 What we learned

For the Frontend team, we gained a deep and technical knowledge of how to use React Components, Hooks, and States in order to dynamically update the content on the website. We also learned how to use Tailwind CSS for easier CSS usage and as a tool for responsive design, and Flask to connect front and backend to handle data storage and requests. As for the backend team, we learned basic generative AI prompt engineering for just the start, but Pose Landmark Estimations (PLE)Dynamic Time Warping (DTW) Euclidean Distance calculation. Another thing that we learned, but also knew a little about beforehand is how useful the Gemini API. We were lucky enough to attend Paige Bailey's Google cloud workshop, where she explored how the API works and it's amazing price per run. We managed to utilize the cheap price, and Geminis long context to get a great response to return to the user with a low token usage.

🔮 What's next for Dancely

Testing boarder use cases of our pose matching algorithm with feedback. Use PLE/OpenCV's webcam integration for live feedback on certain moves such as live dance sessions or gym set form (e.g. back squat, bench press, pullups).