Full-Body VR Fusion

Demo Pose 1
Demo Pose 2
10-Joint Hand Tracking
32-Joint Pose Tracking
Computer Vision Stack
Tech Features

Inspiration

Metaverse, VR, and games today lack true immersion. Even in the Metaverse, as a person, you exist completely phantom waist down. The movement of your elbows is predicted by an algorithm, and can look unstable and jittery. Worst of all, you have to use joycons to do something like waterbend, or spawn a fireball in your open palm.

What it does

We built an iPhone powered full-body 3D tracking system that captures every aspect of the way that you move, and it costs practically nothing. By leveraging MediaPipePose's precise body part tracking and Unity's dynamic digital environment, it allows users to embody a virtual avatar that mirrors their real-life movements with precision. The use of Python-based sockets facilitates real-time communication, ensuring seamless and immediate translation of physical actions into the virtual world, elevating immersion for users engaging in virtual experiences.

How we built it

To create our real-life full-body tracking avatar, we integrated MediaPipePose with Unity and utilized Python-based sockets. Initially, we employed MediaPipePose's computer vision capabilities to capture precise body part coordinates, forming the avatar's basis. Simultaneously, we built a dynamic digital environment within Unity to house the avatar. The critical link connecting these technologies was established through Python-based sockets, enabling real-time communication. This integration seamlessly translated users' physical movements into their virtual avatars, enhancing immersion in virtual spaces.

Challenges we ran into

There were a number of issues. We first used media pipe holistic but then realized it was a legacy system and we couldn't get 3d coordinates for the hands. Then, we transitioned to using media pipe pose for the person's body and cutting out small sections of the image where we detected hands, and running media pipe hand on those sub images to capture both the position of the body and the position of the hands. The math required to map the local coordinate system of the hand tracking to the global coordinate system of the full body pose was difficult. There were latency issues with python --> Unity, that had to be resolved by decreasing the amount of data points. We also had to use techniques like an exponential moving average to make the movements smoother. And naturally, there were hundreds of bugs that had to be resolved in parsing, moving, storing, and working with the data from these deep learning CV models.

Accomplishments that we're proud of

We're proud of achieving full-body tracking, which enhances user immersion. Equally satisfying is our seamless transition of the body with little latency, ensuring a fluid user experience.

What we learned

For one, we learned how to integrate MediaPipePose with Unity, all the while learning socket programming for data transfer via servers in python. We learned how to integrate c# with python since unity only works on c# scripts and MediaPipePose only works on python scripts. We also learned how to use openCV and computer vision pretty intimately, since we had to work around a number of limitations for the libraries and old legacy code lurking around Google. There was also an element of asynchronous code handling via queues. Cool to see the data structure in action!