Check out our interview with MLH!
There’s an old saying that recommends people “dance like no one’s watching.” The idea is to have fun moving, and not care if other people think you look silly. We took that idea and blew it up. The whole point of Dance 'Til You Drop is to have fun dancing by making ALL the moves for EVERYONE to look silly. We know most people aren’t freestyle dance geniuses. So we’ve created a web app using TensorFlow to create random dance moves, and string them together. It’s like a mashup of Monty Python’s Ministry of Silly Walks and Just Dance.
What it does
Dance 'Til You Drop uses TensorFlow to display a wireframe human image, and randomly positions the arms, legs, and torso (within pre-set humanly-possible ranges) into unique and creative poses. Players stand in front of their device’s camera and try to imitate each pose. The web app takes the player’s image, checks it against the randomly generated pose. If they match, it advances to the next pose automatically, and the speed of the poses can be increased to increase difficulty and player fun!
How we built it
There are a number of different components of the app itself. First, let’s start out with the front end website. Everything is built in React and Typescript, and styled with Material-UI. This allows us to easily rescale and rearrange the layouts of various pages to allow us to accommodate any device a user accesses the website from, whether it be from a computer or mobile device.
Then, we have the authentication system to allow users to login, logout, reset their password, etc. This is all powered by Firebase Authentication. On the profile page, everything is stored in FireStore, and Google Cloud Storage is used to store the uploaded profile pictures. This whole system is fully featured, and you can test it out at the demo links below.
To access the video stream from a mobile device, and pair it to the computer, we have large parts of the flow working, but there are still some gaps just based on what we could implement in the limited time frame. The full process, however, is the QR code-generator npm module creates a unique QR code for each user on demand, they scan that QR code with the Flutter app, which once they allow access to the camera on their mobile device gets streamed via webSockets through a Google Kubernetes Cluster we’re using as our Real Time Communication server back to the front end website. This, once fully completed, will allow the user to utilize the generally better camera on their phone to act as the controller, while still using the larger computer screen to view the poses they need to make, and use the generally more powerful computer GPU for the TensorFlow PoseNet analysis.
When it came to calibration, all of the data was collected via PoseNet, and the handlers are pretty standard React. We did do a lot of work to verify that there was only one active instance of anything running PoseNet at the same time, but we’ll talk more about that in the challenges section. Likewise, when it came to randomly generating and displaying the poses, all of that was standard TypeScript libraries; we wrote our own algorithms to handle all of the generation and display (as discussed in the challenges section).
Finally, in terms of scoring and recording high scores, those features are still only partially completed and not yet deployed because we found that it was too difficult to make some of the randomly generated poses (thus it was kicking you out soon into the dancing, which didn’t make it much fun). All of this processing, however, because it is so math intensive to match all the different poses up to see if they match, and because there is such a wide scope of user data required to store and retrieve high scores, will be handled via a backend built using serverless functions. Firebase Functions allow us to not only offload processing of the scoring to a more powerful server so that it does not freeze/crash the user’s machine, but it also restricts the data that is accessible to the user so they only see what should be publicly accessible.
Challenges we ran into
The next major challenge that we ran into was how to ensure there were no competing copies of PoseNet running. Because each of the PoseNet analyzers needed to return the pose back to React so that we could render it (and the video, since we were not using the react-webcam anymore) onto a canvas for the user to see, we needed to create a new instance each time the React state rerendered. However, because these were recursive listeners that fired as soon as the previous frame was done, this quickly became CPU intensive, and computer crash-y. To solve this problem, we built our own “factory” functions so that upon a React render, a new serial number was assigned. Then, each time any function ran, it checked if it’s internal serial number against the master serial number that should be running (i.e. the last one created). If the serial numbers did not match, then we know that the functions should not be running, and we could terminate the recursion and prevent duplicate returns. This also allowed us to solve the issue of PoseNet running even after we turned the webcam off, until it crashed the app. By setting the serial number of the “active” function to 0, then every function terminated on the next time it tried to recurse, and we solved our infinite stack issues.
Finally, we spent a lot of time generating the poses for the person to dance to. To turn a generic dance move into something that fits each person’s aspect ratio, we used the angle of the limbs, and the length that the computer saw as the generator. This allowed us to represent complex 3D poses in two dimensional figures that we could rescale the limb lengths we measured in calibration, so that the point matching of a dance move works for any person. Obviously while we did our best to determine how all 3D ranges of motion can appear in 2D my angling and scaling a image, we still have some work to do, but we’re very proud of our technique (even if we just need to tweak the parameters). The rest from there was just a lot of Trig.
Accomplishments that we're proud of
This was our first time using TensorFlow, so we had a huge learning curve when it came to doing the pose analysis. As a result, we were incredibly proud to have implemented all of the machine learning features that we needed; the rest of the image analysis for scoring is simple vector distance calculations that we can quickly write using the built in math functions. Consequently, we’re really happy that everything from the initialization and calibration to the pose generation and matching is fully implemented, and deployed for anyone to try it.
Likewise, the login and profile system is fully fleshed out and robust, so while it isn’t necessarily the flashiest feature of the app, we’re proud that it was fully implemented (again, so anyone can try it and user login is not a sticking point for demoing the rest of the app).
While there is still more we would have liked to do, both of us have a habit of trying to bite off a 2-4 week programming project in 24 hours, so we’re very proud of everything that is working, and that a minimum viable product has been completed for anyone to use.
What we learned
As discussed in more detail in the challenges section, we learned a lot about how TensorFlow works, and we’re really proud of what we have working.
Finally, this is the first time we have really done anything with live video manipulation, and it was cool to go into more detail about how the media streams were created, provisioned, and then controlled throughout their lifespan. This was also something that tooks some time to figure out, but we learned a lot in the process, and the app now only uses the camera when appropriate, and does not keep the camera provisioned and running even when the display is not visible or required.
What's next for Dance 'Til You Drop
When we came up with this hack idea, we pretty much wanted to try for all of the features that we could. Therefore, what’s next is just mostly working on finishing up features that were only partially completed (like the high score/scoring, and the phone video streaming), refining what we have to work better (making our human-ish poses into human poses), and just adding more variety to the game play (music and sound effects, recording your dance so you can see it and dance to it later, etc.).
Our domain is DanceTilYouDrop.online.