1606.06854 - hassony2/inria-research-wiki GitHub Wiki

IJCAI

[arxiv 1606.06854] Model-based deep hand pose estimation [PDF] [code] [notes]

Xingyi Zhou, Qingfu Wan, Zhang Wei, Xiangyang Xue, Yichen Wei

read 30/05/2017

Objective

Integrate kinematic pose prior into learning as neural network layer

Synthesis

  • Integrates angle constraints in loss
  • add layer that maps from pose to joint positions
  • adding constraint on angles is effective for filtering anatomically absurd poses (adding physical loss prevents almost all inconceivable angles)

Problem Description

26 Dof :

  • 3DOF hand rotation
  • 3DoF hand translation
  • rotation angles on joints

each roation has upper/lower bounds can be according to anatomic studies, in practice, estimated from ground truth annotation on training data

Bone length known and fixed according to GT in NYU training dataset

forward kinematic function :

  • maps pose to 3D joint coordinates
  • differentiable
  • highly non linear
  • tree kinematic chain, each joint position can be deduced in forward kinematic process (so the function contains sins and coss)

input : depth image

output: 3D hand joints and pose parameter

Pipeline

Preprocess

extract cube resized to 128x128 depth values normalized to [-1, 1]

Network structure

  • 3 conv layers with kernel size 5x5x3 8 channels

  • max-pooling with stride 4x2x1

  • conv feature map : 12x12x8

  • 2fc with 1024 neurons, followed by dropout with ratio 0.3 (2 dropouts or 1 dropout ?)

  • 1 fc with 26 dimentional pose param

  • hand model layer that predicts 3D joint locations

  • ReLUs for all conv and fc

training settings : batch size 512, learning rate 0.003 and momentum 0.9

Loss

  • trained with Euclidian loss

  • add constraint on angles as loss ~ sum_i{max(θmin −θi, 0)+max(θi −θmax, 0)}

  • total loss weighted of the two contributions

Results

Metrics

  • mean joint error over test frames
  • proportion of frames whose maximum joint error is below threshold

Baselines

  • direct joint estimates 3d joint coordinates directly
  • direct parameter estimates DoF such as angles directly

Comparison

Compared with :

Definitions

canonical pose : corresponds to pose with all parameters set to 0

forward kinematic function : maps pose(angles and translations) to 3D joint locations