1502.06807 - hassony2/inria-research-wiki GitHub Wiki
CVWW 2016 (computer vision winter workshop)
[1502.06807] Hands Deep in Deep Learning for Hand Pose Estimation [PDF] [code] [notes]
Markus Oberweger, Paul Wolhart, Vincent Lepetit
read 17/05/2017
Synthesis
Ideas :
-
DeepPrior reuse fact that low dimensional prior can capture parametrization of hand's 3D pose ==> impose a bottleneck in the network : Less neurons then final parameters (<3*nb joints) to force the network to learn physical constraints of the hand. The bottleneck is added as an additional layer
-
Output : joint positions in 3D, so final output has size 3*joint_nb (regression) I think they predict pixel positions for u and v, and depth normalized to [-1, 1]
-
refine this first prediction
-
input : several zooms centered on predicted joint location (patches) (various levels of spatial context)
-
pooling size increase with patch size
-
several iterations
No special handling of occluded joints
Noisy annotations ==> robust Huber (?) loss
Pipeline
Basic
2D hand detector ==> coarse bounding box of the hand
use bboxes as input to the CNN-based pose predictors
Advanced
Predict all joint locations simultaneously
Variant 1
Shallow network (1conv, 1 max-pool, 1 fc)
Variant 1
Deeper network (3 conv + max-pool, 2 fc)
Variant 3
Multiscale
Code
https://github.com/moberweger/deep-prior
Results
Tested on NYU Hand pose dataset (lot of depth missing values, and depth is taken to augment the 2D coordinates) and ICVL Hand Posture dataset (less pose variation and sometimes inacurate annotations but almot no depth missing values)
Best result with 30 neurons bottleneck and overlapping refinement step