1912.02923.md - hassony2/inria-research-wiki GitHub Wiki

Generating 3D People in Scenes without People, ArXiv'19 {paper}

Yan Zhang, Mohamed Hassan† Heiko Neumann, Michael J. Black, Siyu Tang

Objective

Given a 3D scene, how do people position themselves ?

Produce a generative model of human poses given a conditional scene: automatically generate realistic human meshes in the scene.

Enforce physical constraints to favour plausible human poses.

Datasets

PROX-Qualitative dataset --> 3D people moving in 3D scenes

+ Augmentation using synthetic renderings

Method

Conditional Variational Auto-Encoder (CVAE) that directly regresses SMPL-X body parameters

scene encoding
- capture depth map and semantic segmentation from diverse views
- stack of projections of depth, normalized to [-1, 1] for different views
body encoding
- root-relative pose and shape
- SMPL-X parameters with 32 body and 24 hand pose features, 10 body shape features
- body pose features in latent VPoser space
- global 6DoF
- Rotation as 6D prediction in continuous rotation space
- translation: 3 parameters

Experiments

Evaluation
- Report "reconstruction error" ? "We test the models using samples from real cameras in test scenes. For quantitative evaluation, we feed individual test samples to our models, and report the mean of the reconstruction errors, and the negative evidenced lower bound (ELBO), i.e. −logP(X), which is the sum of the reconstruction error and the KL divergence.
- Report the loss of the variational auto-encoder
- user study to evaluate plausibility of generated poses
  - "for each scene and each model we generate 100 bodies, and ask Turkers to give a score between 1 (strongly not natural) and 5 (strongly natural) to each individual result."
- Diversity metric: This metric aims to evaluate how diverse the generated human bodies are. Specifically, we empirically perform K-means to cluster the SMPL-X parameters of all the generated human bodies to 20 clusters. Then, we compute the entropy of the cluster ID histogram of all the samples. We also compute the average size of all the clusters
- Physical scores
  - non-collision score: number of body mesh vertices with positive SDF values divided by the number of all body mesh vertices
  - contact ratio : number of body meshes with contact (at least one vertex with negative value) divided by all generated body meshes