1502.06807 - hassony2/inria-research-wiki GitHub Wiki

CVWW 2016 (computer vision winter workshop)

[1502.06807] Hands Deep in Deep Learning for Hand Pose Estimation [PDF] [code] [notes]

Markus Oberweger, Paul Wolhart, Vincent Lepetit

read 17/05/2017

Synthesis

Ideas :

DeepPrior reuse fact that low dimensional prior can capture parametrization of hand's 3D pose ==> impose a bottleneck in the network : Less neurons then final parameters (<3*nb joints) to force the network to learn physical constraints of the hand. The bottleneck is added as an additional layer
Output : joint positions in 3D, so final output has size 3*joint_nb (regression) I think they predict pixel positions for u and v, and depth normalized to [-1, 1]
refine this first prediction
input : several zooms centered on predicted joint location (patches) (various levels of spatial context)
pooling size increase with patch size
several iterations

No special handling of occluded joints

Noisy annotations ==> robust Huber (?) loss

Pipeline

Basic

2D hand detector ==> coarse bounding box of the hand

use bboxes as input to the CNN-based pose predictors

Advanced

Predict all joint locations simultaneously

Variant 1

Shallow network (1conv, 1 max-pool, 1 fc)

Variant 1

Deeper network (3 conv + max-pool, 2 fc)

Variant 3

Multiscale

Code

https://github.com/moberweger/deep-prior

Results

Tested on NYU Hand pose dataset (lot of depth missing values, and depth is taken to augment the 2D coordinates) and ICVL Hand Posture dataset (less pose variation and sometimes inacurate annotations but almot no depth missing values)

Best result with 30 neurons bottleneck and overlapping refinement step