1704.02201 - hassony2/inria-research-wiki GitHub Wiki

Arxiv 2017

Franziska Mueller, Dushyant Mehta, Oleksandr Sotnychenko, Srinath Sridhar, Dan Casas, Christian Theobalt

Estimate in real-time, robustly and accurately hand pose in cluttered environment from moving egocentric RGBD images

Find joint angles of kinematic hand skeleton

created synthHands photorealistic dataset

2 steps (CNNs) :

Hand localization
3D joint localization regression
temporal smoothing using kinematic skeleton (26 DOF) with bones lengths that are adapted to each user

Compute colored depth map, maps each pixel in color image plane onto depth map

CNN HALnet (Hand Localization Net) used to estimate root point of the hand (wether it is visible or not), outputs heatmap

If max of heatmap low (<0.1) or far from previous max, it is marked uncertain and updated accordingly

crop uses heatmap to estimate most probable location and make depth-dependent crop

JORNet regresses :

to enforce bone length constraints and temporal smoothness

new photorealistic datset using merged reality:

posing photo-realistic hand model interacting with virtual object to

allows composition with various objects

hand movement is acquired using real time hand tracking (takes advantagge of the fact that we're good at hand-tracking when no object interaction)

allows for variability in hand skin color, shape, pose, texture, background clutter, camera viewpoint...

increase variability:

==> RGBD training data

3130 frames in moving egocentric viewpoints

Two step approach outperforms one CNN doing directly the final task

3D Euclidian distance

Train on jointly synthetic and real data by using domain adaptation