Inspiration
While brainstorming ideas for our hackathon project, one member offhandedly mentioned the homework they had for their dance team: create a formation chart for a dance choreography. We then realized we wanted to automate the long process of watching the video at a slower speed and noting down each dance member's position. Choreolyzer is the first program of its kind: letting dancers dance more by giving back the time they would have otherwise spent on tedious tasks
What it does
Choreolyzer automates the development of dance formations by tracking and analyzing the positions of every dancer. Users are able to view the formations at every frame of the video from a bird's-eye view with a side-by-side comparison of the original video for a comprehensive experience.
How we built it
The foundational component of Choreolyzer were the models used to analyze the dance video. YOLOX (You Only Look Once) was used to detect the objects (position of the dancers) for each frame in the video. MOTRv2 was used for multi-person tracking throughout the video. A combination of YOLOX and MOTRv2 provided the basis for determining the positions of each dancer at any given frame.
To convert the positional information from the frontal view to a bird's-eye view, we used a combination of OpenCV libraries and linear algebra fundamentals.
The frontend of Choreolyzer was built using Javascript, React, and TailwindCSS to create a clean and minimalistic user interface where users could input a choreography video.
Challenges we ran into
The main challenges we faced were from the incompatibility of the object-tracking libraries and the implementation of the bird's-eye view transformation
Model Compatibility
Our biggest problem stemmed from the lack of documentation and the closed nature of the MOTRv2 model. The model was only preconfigured to run on their dataset, rather than custom videos. So we had to manually run YOLOX on the backend for every video we input to compute the positions of the dancers, which lead to a large increase in runtime. Additionally, the values for input and output paths were hardcoded into the original MOTRv2 model, so we had to change the source code of the model to account for customization.
Bird's-Eye View Transformation
We struggled to ideate a way to transform the frontal view to a bird's-eye view because we had no information about the camera parameters. However, after thorough research, we developed a solution that used matrix transformations and OpenCV functionality.
Accomplishments we're proud of
First of it's Kind
Choreolyzer tracks automatically tracks dancers in a way never done before, utilizing the combination of multiple state of the art models. From various dancers' feedback, such a software is incredibly useful for their craft.
Combining Multiple Frameworks
Having never worked with MOTRv2, YOLOX, and OpenCV, we were still able to chain their functionalities together to go from a raw video to a complete analysis of every dance video.
What we learned
This project proved to be a valuable learning experience for us. We gained proficiency in both machine learning models and implementing a clean UI.
Machine Learning Models
We were introduced to the field of Multi-Object Tracking and Object Recognition. We learned about the benefits and shortcomings of various models in the field by conducting a literature review. We also learned about the architectures of YOLOX and MOTRv2
UI Implementation
We learned more about creating a responsive and interactive UI, especially with the website's functionality to let users pick the bounds of the dance video.
What's next for Choreolyzer
### Acquire Users We're already gearing up to launch with Seoulstice (the Georgia Tech K-Pop cover team) to implement into their practice routine and provide early testing feedback. We plan to expand to other Georgia Tech dance teams and eventually to external and commercial organizations. ### Improve Model While MOTRv2 and YOLOX are state of the art models, they have the potential to be more accurate. We plan to implement active learning (human feedback during model training) and a way to fine-tune the model for the specific style of dance. Additionally, we plan to implement a key-frame metric using positional averages that will output areas of major formation changes.
Add user customization
We plan to implement front-end features where users can drag the positions of dancers to their desired location or correct any potential errors in the model.
Log in or sign up for Devpost to join the conversation.