1606.06854 - hassony2/inria-research-wiki GitHub Wiki
IJCAI
[arxiv 1606.06854] Model-based deep hand pose estimation [PDF] [code] [notes]
Xingyi Zhou, Qingfu Wan, Zhang Wei, Xiangyang Xue, Yichen Wei
read 30/05/2017
Objective
Integrate kinematic pose prior into learning as neural network layer
Synthesis
- Integrates angle constraints in loss
- add layer that maps from pose to joint positions
- adding constraint on angles is effective for filtering anatomically absurd poses (adding physical loss prevents almost all inconceivable angles)
Problem Description
26 Dof :
- 3DOF hand rotation
- 3DoF hand translation
- rotation angles on joints
each roation has upper/lower bounds can be according to anatomic studies, in practice, estimated from ground truth annotation on training data
Bone length known and fixed according to GT in NYU training dataset
forward kinematic function :
- maps pose to 3D joint coordinates
- differentiable
- highly non linear
- tree kinematic chain, each joint position can be deduced in forward kinematic process (so the function contains sins and coss)
input : depth image
output: 3D hand joints and pose parameter
Pipeline
Preprocess
extract cube resized to 128x128 depth values normalized to [-1, 1]
Network structure
-
3 conv layers with kernel size 5x5x3 8 channels
-
max-pooling with stride 4x2x1
-
conv feature map : 12x12x8
-
2fc with 1024 neurons, followed by dropout with ratio 0.3 (2 dropouts or 1 dropout ?)
-
1 fc with 26 dimentional pose param
-
hand model layer that predicts 3D joint locations
-
ReLUs for all conv and fc
training settings : batch size 512, learning rate 0.003 and momentum 0.9
Loss
-
trained with Euclidian loss
-
add constraint on angles as loss ~ sum_i{max(θmin −θi, 0)+max(θi −θmax, 0)}
-
total loss weighted of the two contributions
Results
Metrics
- mean joint error over test frames
- proportion of frames whose maximum joint error is below threshold
Baselines
- direct joint estimates 3d joint coordinates directly
- direct parameter estimates DoF such as angles directly
Comparison
Compared with :
-
Deep Prior from Hands Deep in Deep Learning for Hand Pose Estimation (worse of the compared)
-
Training a Feedback Loop for Hand Pose Estimation (comparable results, feedback actually seems a bit better, see Fig 5)
Definitions
canonical pose : corresponds to pose with all parameters set to 0
forward kinematic function : maps pose(angles and translations) to 3D joint locations