siggraph mano - hassony2/inria-research-wiki GitHub Wiki

SIGGRAPH Asia 2017

[siggraph-mano] Embodied Hands : Modeling and Capturing Hands and Bodies Together [PDF] [project page] [notes]

Javier Romero, Dimitrios Tzionas, Michael J. Black

read 30/10/2017

Objective

Create a full body + hand model parametrized by shape and pose vectors They add MANO for hand Model with Articulated and Non-rigid defOrmation to the existing SMPL body model. As for SMPL, the model factors the final mesh changes into those inherent to the subject anatomy and the ones that are induced by the pose

Synthesis

The hand containes a large number of joints with restricted degrees of freedom The goal is to learn a low-dimensional representation of pose and shape separately and to use them to produce the final mesh of the hand

Model

The pose and shape values determine the position of the mesh vertices through a skinning function that depends on the shape, the joint locations, the kinematic tree, the pose and the shape. It also depends on blend weights that allow the skinning function to be parametric. The skinning function used is Linear Blend Skinning (LBS) The shape contribution is a linear combination of principal component analysis components extracted from pose-normalized hand meshes

The pose contributes as vertices offsets from a zero pose.

Data

The dataset consist of 3d hand scans with fixed wrist position for 31 subjects performing a subset of 51 poses for a total of 2018 scans.

The scanning units are composed of 1 RGB and 2 grayscale cameras. They are used to produce a texture map and a mesh of 50.000 vertices. All left hands are mirrored to train a right-hand model. The left-hand model is then the mirror of the right-hand model.

Hand-object interaction scans are included with objects painted green for easy segmentation of object vs hand

Training

Register scans

A simple model is used to register a template to the hand scans, to bring them into correspondence The registration is solved by minimizing an energy composed of four components:

  • geometry (mesh point-to-point distance between true and predicted mesh vertices)
  • coupling term (encourages proximity to model by minimizing differences between edges of the model and registered mesh)
  • pose prior ()
  • shape prior (penalizes Mahalanobis distance{distance from point to distribution by measuring how many stds away from mean the point is} between optimized shape and distribution of hand shapes in other dataset (CAESAR))

Model parameters

  • 15 joints (3 per finger) + global orientation
  • consider each joint as ball joints for simplicity (although actually most articulations are restricted to one degree of freedom)
  • this over-parametrization imposes the use of regularization: dependency of vertex movement on far-away joints is penalized

Integration with SMPL body model

Objective

Full registration of body scans.

In order to limit the number of dimensions for the full model, a 6-dimensional linear embedding based on the 6 first Principal Components are is used for each hand.

Therefore, SMPL+H(ands) has 78 degrees of freedom for pose vs 66 for SMPL

Experiments

Dataset

They capture 3 datasets of 4D sequences of body scans (41 sequences for a total of 27k frames):

  • 5 male and 5 female subjects performing 11 improvised and unconstrained actions
  • 1 person performing 28 sequences focusing on hand motion (finger counting, keyboard typing, ...)
  • 2 actors improvising movements expressing fear

Model

The final model is composed of a triangulated mesh of 7k vetrices. Registration is cast as an optimization problem: the goal is to minimize the discrepancy between ground truth scans and model (minimizing the vertex distances).

  • the first 50 frames are used to compute a subject-specific template
  • the template is then aligned to all scans
  • optimization for each frame is initialized with the pose of the previous frame in both stages

Future work

They plan to use MANO to generate synthetic textured data to train pose and shape estimators

Notes

LBS

Rigid skinning associates mesh vertices to a bone and links the mesh displacements to thos of the bones, this produces discontinuities around regions where skin is associated to different bones

On the other hand Linear Blend Skinning assigns each skin vertex to more than one bone and removes the discontinuity by blending linearly vertices near the joint. The final position of the vertex is obtained by weighted averaging of the contribution of each bone displacement to the skin offset Normals and tangent vectors are blended as well. The blending weights should be smoothed (for instance normalized relative distance to the linked bones) They should also be convex (positive and sum to one for each skin vertex)

Pinocchio software solves a Poisson equation for each bone with appropriate boundary conditions to obtain smooth weights

Limits

The formulation does not lead to a rigid body transform (no volume conservation), this can produce unnatural shrinking effects for extreme bending or twisting transformations