1606.06854 - hassony2/inria-research-wiki GitHub Wiki

Xingyi Zhou, Qingfu Wan, Zhang Wei, Xiangyang Xue, Yichen Wei

read 30/05/2017

Objective

Integrate kinematic pose prior into learning as neural network layer

Integrates angle constraints in loss
add layer that maps from pose to joint positions
adding constraint on angles is effective for filtering anatomically absurd poses (adding physical loss prevents almost all inconceivable angles)

26 Dof :

each roation has upper/lower bounds can be according to anatomic studies, in practice, estimated from ground truth annotation on training data

Bone length known and fixed according to GT in NYU training dataset

forward kinematic function :

maps pose to 3D joint coordinates
differentiable
highly non linear
tree kinematic chain, each joint position can be deduced in forward kinematic process (so the function contains sins and coss)

input : depth image

output: 3D hand joints and pose parameter

extract cube resized to 128x128 depth values normalized to [-1, 1]

3 conv layers with kernel size 5x5x3 8 channels
max-pooling with stride 4x2x1
conv feature map : 12x12x8
2fc with 1024 neurons, followed by dropout with ratio 0.3 (2 dropouts or 1 dropout ?)
1 fc with 26 dimentional pose param
hand model layer that predicts 3D joint locations
ReLUs for all conv and fc

training settings : batch size 512, learning rate 0.003 and momentum 0.9

Compared with :

Deep Prior from Hands Deep in Deep Learning for Hand Pose Estimation (worse of the compared)
Training a Feedback Loop for Hand Pose Estimation (comparable results, feedback actually seems a bit better, see Fig 5)

canonical pose : corresponds to pose with all parameters set to 0

forward kinematic function : maps pose(angles and translations) to 3D joint locations