Inspiration
Humans use their hands so much that we often take for granted how easy it is to use them in a nimble way. However, VR applications used for training people with high-stakes jobs (i.e. surgeon, firefighter, astronaut) cannot accurately model human hands performing nimble tasks, to the detriment of the effectiveness and immersion of the simulation. Principally, generating physically plausible hand motions is a heavily underconstrained problem, especially with the limited inputs that the user can provide. Thus, most VR applications simply stick objects to the user's hand, as if they were "welded" together.
What it does
The model effectively performs two tasks:
Intent prediction
First, a deep learning model learns to predict the user's intentions given a window of past actions, and controls the hand to conform to a shape that anticipates incoming objects, and avoids colliding with other clutter in the scene.
Static contact optimization
Once the user makes contact with an object that they are attempting to manipulate, the model finds all contact points and assumes a mostly-static contact at each one. A modified inverse kinematics problem is solved to find the best hand pose to achieve the intended object motion, simultaneously maximizing realism according to the intent prediction model, and minimizing fingertip slipping.
How we built it
Large motion capture databases like GRAB (2020) provide high quality examples of humans interacting with varied objects. We augment the dataset using additional sensory features including:
- Coarse occupancy
- Fine detail proximity sensing
- Hand trajectory history
Visualizations of sensory input are included in the gallery.
The augmented dataset is used to train the intent inference model, which predicts the hand pose in an autoregressive way.
When static contact is initiated, we formulate the special inverse kinematics problem as follows:
We want to simultaneously:
- minimize finger slipping
- maximize realism according to the intent prediction model
Formally,
argmin(joints) slip(joints) - realism(joints)
Where we formulate the hand as a forward kinematic chain FK with respect to its joint angles joints as shown in the gallery.
The objective function slip can be expressed with respect to joint angles joints and contacts C as
slip(joints) = |FK(joints) - C|^2
The realism objective realism comes directly from evaluating the intent inference model.
We can use a quasi-Newton solver (e.g. BFGS) to solve this fully differentiable optimization problem for the best hand pose to create the desired object motion as dictated by the user.
Challenges we ran into
The formulation of the optimization problem and the solver code was the trickiest part.
What's next for Static Contact is All you Need: Dexterous Manipulation in VR
Whereas static contact enables the solution of the problem in this setting, fully realistic simulation would require enabling some slipping contacts. Perhaps deep learning techniques can learn a) when contact is slipping / non slipping, and b) how to interpolate between contacts.


Log in or sign up for Devpost to join the conversation.