Static Contact is All You Need: Nimble Hands for VR Avatars

GIF
Static contacts allow the model to infer stable, plausible, and realistic poses
The forward kinematic chain of the hand, showing how to calculate thumb tip position
Coarse occupancy
Fine detailed proximity sensing
Trajectory history

Inspiration

Humans use their hands so much that we often take for granted how easy it is to use them in a nimble way. However, VR applications used for training people with high-stakes jobs (i.e. surgeon, firefighter, astronaut) cannot accurately model human hands performing nimble tasks, to the detriment of the effectiveness and immersion of the simulation. Principally, generating physically plausible hand motions is a heavily underconstrained problem, especially with the limited inputs that the user can provide. Thus, most VR applications simply stick objects to the user's hand, as if they were "welded" together.

What it does

The model effectively performs two tasks:

Intent prediction

First, a deep learning model learns to predict the user's intentions given a window of past actions, and controls the hand to conform to a shape that anticipates incoming objects, and avoids colliding with other clutter in the scene.

Static contact optimization

Once the user makes contact with an object that they are attempting to manipulate, the model finds all contact points and assumes a mostly-static contact at each one. A modified inverse kinematics problem is solved to find the best hand pose to achieve the intended object motion, simultaneously maximizing realism according to the intent prediction model, and minimizing fingertip slipping.

How we built it

Large motion capture databases like GRAB (2020) provide high quality examples of humans interacting with varied objects. We augment the dataset using additional sensory features including:

Coarse occupancy
Fine detail proximity sensing
Hand trajectory history

Visualizations of sensory input are included in the gallery.

The augmented dataset is used to train the intent inference model, which predicts the hand pose in an autoregressive way.

When static contact is initiated, we formulate the special inverse kinematics problem as follows:

We want to simultaneously:

minimize finger slipping
maximize realism according to the intent prediction model

Formally,

argmin(joints) slip(joints) - realism(joints)

Where we formulate the hand as a forward kinematic chain FK with respect to its joint angles joints as shown in the gallery.

The objective function slip can be expressed with respect to joint angles joints and contacts C as

slip(joints) = |FK(joints) - C|^2

The realism objective realism comes directly from evaluating the intent inference model.

We can use a quasi-Newton solver (e.g. BFGS) to solve this fully differentiable optimization problem for the best hand pose to create the desired object motion as dictated by the user.

Challenges we ran into

The formulation of the optimization problem and the solver code was the trickiest part.

What's next for Static Contact is All you Need: Dexterous Manipulation in VR

Whereas static contact enables the solution of the problem in this setting, fully realistic simulation would require enabling some slipping contacts. Perhaps deep learning techniques can learn a) when contact is slipping / non slipping, and b) how to interpolate between contacts.