microsoft efficient hand tracking - hassony2/inria-research-wiki GitHub Wiki

SIGGRAPH 2016

[microsoft-efficient-hand-tracking] Efficient and precise interactive hand tracking through joint, continuous optimization of pose and correspondences [PDF] [video] [notes]

Jonathan Taylor, Benjamin Luff, Arran Topalian, Erroll Wood, Sameh Khamis, Pushmeet Kohli, Shahram Izadi, Richard Banks, Andrew Fitzgibbon, Jamie Shotton, Lucas Bordeaux, Thomas Cashman, Bob Corish, Cem Keskin, Toby Sharp, Eduardo Soto, David Sweeney, Julien Valentin

read 19/05/2017

Objective

Real-time inference of hand's precise position using smooth model of hand surface

Synthesis

28 DOF model (wrist included)

Pipeline

  • preprocessing

    • hand position detected from kinect tracker
    • segmentation deducted using connected components algo
    • sample pixels (iteratively by sampling from 5 candidates the furthest from the current set of points)
  • define smooth energy function as weighted sum of energy terms

    • consistency of data and model points
    • data point should be close to smooth surface
      • penalize projection of model outside of the silhouette
    • joint angles should be realistic
    • time consistency (Genrman McClure Kernel)
    • penalize self-intersection by measuring interpenetration of sphere-modeled fingers
    • consistency with finger-tip detector
  • optimizes correspondences and pose simultaneously

    • using Levenberg iterations
  • use with several initializations

    • time-aware taking into account estimated velocity
    • retrieval tree method to generate propositions from hand-crops using method from "Learning to navigate the energy landscape" by Valentin et al.

Results

More complex model (mesh instead of 3d primitives) can be compensated by less iterations in the algorithm

Still a need to improve robustness and latency

metrics

  • error between predicted and annotated 3D markers such as finger tips or joints error

    • average or max error bellow threshold
  • classification error for hand parts (finger paint dataset)

    • average or max error bellow threshold (must accurately learn class of all pixels)