1701.02468 - hassony2/inria-research-wiki GitHub Wiki

CVPR 2017

[arxiv 1701.02468] Unite the People: Closing the Loop Between 3D and 2D Human Representations arXiv:1701.02468v2 [PDF] [project page] [code to be published]

Christoph Lassner, Javier Romero, Martin Kiefel, Federica Bogo, Michael J. Black, Peter V. Gehler

Objective

Scalable method to produce 3D body model fits for 2D images

Improve body pose and shape optimization

Synthesis

Optimization takes into account term that makes body silohouettes match in 2D and 3D. (optimizes distance between estimated silhouette from 3d model and original image and also other terms)

Creates dataset UPI-3D with 3d pose correspondance

Pipeline

Mechanical turk segmentation annotation (foreground and 6 body parts).

Use SMPLify matching with additional term that takes into account silhouette matching

Minimizes sum of absolute distance from point on one of the silhouettes to closest point on the other silhouette. The final distance is biderectionnal with the distance from projected model to silhouette using L1 distance (because of noise) while the other computes L2 distance. (Additional term to previous SMPLify terms)

Use human annotators to jusge quality of the optimization's proposition (binary accept or reject)

==> Creates dataset UPI-3D of 5.569 training images and 1.203 testing images

Segmentation dataset is obtained by projecting 3D segmentation made on 3D model to original images

Training is performed using a ResNet101 network with loss pixel-wise cross entropy

Direct 3D pose and shape prediction

Based on 92 landmarks, estimates directly 3D pose (from 2D to 3D)

2 separately trained random forests : one that predicts rotation vectors for the joints and the other the depth

Definitions

silhouette : all pixels belonging to the body projection