1704.02224 - hassony2/inria-research-wiki GitHub Wiki
Arxiv 2017
[arxiv 1704.02224] Hand3D: Hand Pose Estimation using 3D Neural Network [PDF] [project page] [notes]
Xiaoming Deng, Shuo Yang, Yinda Zhang, Ping Tan, Liang Chang, Hongan Wang
read 19/05/2017
Objective
Using only one depth image, convert depth map to 3D volumetric representation
Convert depth map to 3D volumetric representation, feed it into a 3D CNN to directly produce pose in 3D
Synthesis
Train a FCN (fully convolutional network) to complete hand shape from a single depth view.
Augment existing datasets : use existing poses to render depth images
Pipeline
-
build a reference coordinate at the COM (center of mass) of the foreground region
-
convert inpu depth to truncated signed distance function (TSDF) of resolution 60^3
-
TSDF refinement network : removes artifacts caused by noisy and missing depth
- Hourglass like structure :
- 2 3D conv, Pooling , relu * 3
- 2 3D conv, UpPooling , relu * 3
- with shortcuts between layers of same scale
- Hourglass like structure :
-
refined TSDF is fed to 3D pose network to estimate 3D lactaion of each joint
- Structure : 2* 3D conv, pooling, Relu, 3 layers of FC
- Output : joints positions as coordinates (3* nb joints vector)
- L2 loss
Results
metrics : percentage of samples with predicted joints within maximum distance from ground truth
==> state of the art results (74 and 96% for 50mm threshold and 18 amd 40% at 20 mm (NYU and ICVL respectively))
presented on ICVL and NYU hand pose dataset
Definitions
SDF (signed distance function) : function that assigns to a point it's signed distance to the boundary of a set. Signed because positive if inside set and negative if outside set.
Truncated (above and below?) a certain value.