1704.00529 - hassony2/inria-research-wiki GitHub Wiki

ICCV 2015

[1704.00529] 3D object reconstruction from hand-object interactions [PDF] [notes]

Dimitrios Tzionas, Juergen Gall

read 11/04/2018

Objective

Improve in-hand scanning of textureless and symmetric objects using a RGBD cameraby directly using hand motion information

Synthesis

Hand motion capture

Remove depth by thresholding
skin color segmentation using a Gaussian Mixture Model
Obtain masked RGB-D images for object (? by removing the hand and keeping the rest ?) and hand
Estimate hand pose using a skeleton model and linear blend skinning
Minimize an objective function between model and depth cues and by penalizing collisions --> this produces a hand tracking accuracy of about 17mm

Object reconstruction

Infer contact points

For each vertex they compute closest point of object cloud
Count number of vertices with a closest distance of less then 1mm, if bone has more then 40 contact vertices, bone is considered contact bone
If less then 2 bones selected, increase the distance iteratively until at least 2 end-effectors are selected (in general <2.5 mm is enough)
Obtain contact correspondances between pairs of frames by pairing contact vertices from the source and target frames

Reconstruction

align currently observed point cloud to previous frame and then align the transformed source by ICP (iterative closest point)
for pairing combing object depth and contact points
Minimize visiual energy term based on object depth and contact term based on hand depth and model to match incoming and processed data (current and previous frame)

Seek a rigid transformation matrix that minimizes some energy between the source and target frames

use 2D SIFT feature correspondences backprojected in 3d to have an additional loss term
use 3D feature correspondences based on a sparse set of depth-based correspondences augmented with texture info
use a contact term that depends on current hand pose estimate and depth point cloud to estimate paired contact points, this gives a loss that also takes into account the rigid object transformation (contact points should be moving with object)

This last contact frames unlike the feature-based terms avoids slipping in case of textureless and symmetrical objects, therefore guiding the reconstruction

A refinement stage aligns the current frame with all previously aligned frames

Surface model

Use signed distance function to get volumetric representation (voxels)
Use marching cubes to generate mesh from voxels