Inspiration
We wanted a way for video game developers, casual users, and artists to have custom references without the need for extensive research or expensive equipment.
What it does
In its current state, it takes in images and videos/video streams, generates pose keypoints, and records them onto the website for users to view and possibly use.
How we built it
We used OpenCV for reading and streaming images for analysis. For data analysis, we used Google Mediapipe's Pose Landmark Detection model for 3D pose estimation. FastAPI and basic HTML/JS was used for testing out the functionality on a web browser. Since this project had a very specific end goal and we wanted to continue developing it after the event, we wanted to make sure that we knew close to everything about what we were creating, so AI was not used to generate code for the most part. Initially, it was used to generate template code, but it was overwritten shortly after due to a decision to change the data analysis technology to Mediapipe. Generally, we limited to using it for troubleshooting and gaining insight on how to implement certain features that would be helpful for us to keep going forward. The goal was to have fun and learn how to develop and work with various new technologies, specifically Git and GitHub, rather than to to develop the most impressive app.
Challenges we ran into
Time management, finding the right model: We spent a significant amount of time researching what would be a good solution for our end goal. Initially, we researched datasets like SMPL and SMPL-X to train on models. However, this required a large amount of storage that would be too costly for a short development timeframe. We also researched pre-trained models from Ultralytics and Mediapipe.
Processing on Front-end or Back-end: Later into development, we started to brainstorm a potential way to deploy the app onto the cloud. At that point, the camera streaming functionality was done through the backend, which would lead to the app crashing once someone were to access the website remotely since the server would have to ask the client computer for its camera. This led to some code refactoring so that all the camera and data gathering functionality is done via JavaScript on the front-end.
Model Limitation: Mediapipe has to generate three-dimensional keypoints from one webcam with no physical depth tracking capabilities like LiDAR or 3D triangulation. Without more reliable information, the data may come out to be unreliable and noisy. In the future, we may implement a vision transformer model (ViT) with more promising accuracy. Other methods may include a multi-camera setup for more reliable data capture through image triangulation.
Accomplishments that we're proud of
CoCo: Before starting this project, I had never used a Git Repository nor had the chance to collaborate on a large-scale programming project. It was incredibly fun to work with both front and back-end development to create a personal website application that could be applied to aid in real-world problems. I learned quite a bit of HTML, C, and Python in order to create this project. I underestimated the work it would take to implement a 3d pose skeleton tracking model on top of a webcam, while being able to run it on a web server and store a captured image. This was a rewarding experience, and I look forward to applying this knowledge to further coding projects.
Sebastian: This was the best project I've worked on so far because I used my superior black belt Kung Fu moves to test the app in various stages of development.
What we learned
What's next for hacklytics
We plan to continue the project by exporting keypoints and mapping them onto characters. We may also consider developing a multi-camera triangulation system (based on esp32-cam) that sends it to a more reliable system like YOLOv8 rather than MediaPipe.
Built With
- fastapi
- javascript
- mediapipe
- opencv
- python
- yolo
Log in or sign up for Devpost to join the conversation.