About the Project
Our team originally wanted to build this as a hardware-based motion boxing system. The goal was to create something physical that could track movement and punches in real time. However, due to time and resource constraints, the hardware direction fell through. Instead of dropping the idea completely, we decided to pivot and build a software version in the browser.
This project became a way for us to demonstrate how capable modern computer vision has become, especially using just a webcam.
What Inspired Us
We were interested in showing that computer vision is no longer just experimental, it’s practical enough to drive real-time interaction. When brainstorming, we were inspired by the game Rock 'Em Sock 'Em Robots. Boxing felt like a good test case because it requires both continuous movement (leaning) and fast, discrete actions (punching) and we wanted to implement computer vision to move physical robots.
How We Built It
We used MediaPipe Pose for body tracking, Three.js for the 3D arena, and WebRTC for peer-to-peer multiplayer. The system reads webcam pose landmarks and converts torso lean into movement and arm motion into punches.
For the incomplete hardware side, we used the same library in Python to send serial commands to an Arduino Uno controlling servos.
A lot of the work went into smoothing noisy pose data and tuning thresholds so the controls feel responsive without triggering accidentally.
What We Learned
We learned that the biggest challenge with computer vision input is not detection itself, but stability and usability. Raw pose data is noisy, so making the game feel consistent required a lot of filtering and tuning.
Overall, this project showed us how powerful browser-based computer vision has become and how it can support real-time interactive experiences without dedicated hardware.
Basic Hardware Demo: https://www.youtube.com/shorts/3ooOWy0oPb4
Built With
- css3
- html5
- javascript
- mediapipe
- python
- websockets

Log in or sign up for Devpost to join the conversation.