1704.00529 - hassony2/inria-research-wiki GitHub Wiki
ICCV 2015
[1704.00529] 3D object reconstruction from hand-object interactions [PDF] [notes]
Dimitrios Tzionas, Juergen Gall
read 11/04/2018
Objective
Improve in-hand scanning of textureless and symmetric objects using a RGBD cameraby directly using hand motion information
Synthesis
Hand motion capture
-
Remove depth by thresholding
-
skin color segmentation using a Gaussian Mixture Model
-
Obtain masked RGB-D images for object (? by removing the hand and keeping the rest ?) and hand
-
Estimate hand pose using a skeleton model and linear blend skinning
-
Minimize an objective function between model and depth cues and by penalizing collisions --> this produces a hand tracking accuracy of about 17mm
Object reconstruction
Infer contact points
- For each vertex they compute closest point of object cloud
- Count number of vertices with a closest distance of less then 1mm, if bone has more then 40 contact vertices, bone is considered contact bone
- If less then 2 bones selected, increase the distance iteratively until at least 2 end-effectors are selected (in general <2.5 mm is enough)
- Obtain contact correspondances between pairs of frames by pairing contact vertices from the source and target frames
Reconstruction
- align currently observed point cloud to previous frame and then align the transformed source by ICP (iterative closest point)
- for pairing combing object depth and contact points
- Minimize visiual energy term based on object depth and contact term based on hand depth and model to match incoming and processed data (current and previous frame)
Seek a rigid transformation matrix that minimizes some energy between the source and target frames
- use 2D SIFT feature correspondences backprojected in 3d to have an additional loss term
- use 3D feature correspondences based on a sparse set of depth-based correspondences augmented with texture info
- use a contact term that depends on current hand pose estimate and depth point cloud to estimate paired contact points, this gives a loss that also takes into account the rigid object transformation (contact points should be moving with object)
This last contact frames unlike the feature-based terms avoids slipping in case of textureless and symmetrical objects, therefore guiding the reconstruction
- A refinement stage aligns the current frame with all previously aligned frames
Surface model
- Use signed distance function to get volumetric representation (voxels)
- Use marching cubes to generate mesh from voxels