1704.00529 - hassony2/inria-research-wiki GitHub Wiki

ICCV 2015

[1704.00529] 3D object reconstruction from hand-object interactions [PDF] [notes]

Dimitrios Tzionas, Juergen Gall

read 11/04/2018

Objective

Improve in-hand scanning of textureless and symmetric objects using a RGBD cameraby directly using hand motion information

Synthesis

Hand motion capture

  • Remove depth by thresholding

  • skin color segmentation using a Gaussian Mixture Model

  • Obtain masked RGB-D images for object (? by removing the hand and keeping the rest ?) and hand

  • Estimate hand pose using a skeleton model and linear blend skinning

  • Minimize an objective function between model and depth cues and by penalizing collisions --> this produces a hand tracking accuracy of about 17mm

Object reconstruction

Infer contact points

  • For each vertex they compute closest point of object cloud
  • Count number of vertices with a closest distance of less then 1mm, if bone has more then 40 contact vertices, bone is considered contact bone
  • If less then 2 bones selected, increase the distance iteratively until at least 2 end-effectors are selected (in general <2.5 mm is enough)
  • Obtain contact correspondances between pairs of frames by pairing contact vertices from the source and target frames

Reconstruction

  • align currently observed point cloud to previous frame and then align the transformed source by ICP (iterative closest point)
  • for pairing combing object depth and contact points
  • Minimize visiual energy term based on object depth and contact term based on hand depth and model to match incoming and processed data (current and previous frame)

Seek a rigid transformation matrix that minimizes some energy between the source and target frames

  • use 2D SIFT feature correspondences backprojected in 3d to have an additional loss term
  • use 3D feature correspondences based on a sparse set of depth-based correspondences augmented with texture info
  • use a contact term that depends on current hand pose estimate and depth point cloud to estimate paired contact points, this gives a loss that also takes into account the rigid object transformation (contact points should be moving with object)

This last contact frames unlike the feature-based terms avoids slipping in case of textureless and symmetrical objects, therefore guiding the reconstruction

  • A refinement stage aligns the current frame with all previously aligned frames

Surface model

  • Use signed distance function to get volumetric representation (voxels)
  • Use marching cubes to generate mesh from voxels