1901.05103 - hassony2/inria-research-wiki GitHub Wiki

2019 Arxiv

[arxiv 1901.05103] DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation [PDF] [notes]

Jeong Joon Park, Julian Straub, Richard Newcombe

read 2019/01/22

Objective

Represent and learn shapes of any topology using Signed distance functions.

Synthesis

Method

SDF "decoder"

Input: shape features concatenated with xyz coordinates of the point at which we want to evaluate the signed distance.

Output: Value of signed distance function (scalar) at the given point

DeepSdf: a neural network that given a shape-specific feature learns to approximate the signed distance function

They predict the truncated signed distance function (for distances larger then this threshold from the surface the sdf value is a constant equal to this threshold), small truncation values should allow to focus the expressiveness of the network to the surface while large truncation values have practical implications (for instance faster ray tracing rendering)

They use two variants single-shape SDF (no shape encoding) and code-conditionned SDF for embedding various shapes's dfs in a single network

Note that such a representation can not be used as-is for tasks as single-view reconstruction (for which an ad-hoc encoder is needed, and encoding cannot be encoded as point inputs and matching sdf value)

Training an encoder-less decoder

They train an encoder-less decoder for generative model purposes. For this purpose they initialize shape codes randomly, assign them to a shape and update them during training while learning to approximate the SDF(s)

They assume a multivariate-Gaussian prior of zero mean and $\sigma^2 I$ variance over the shape codes, this amounts to adding a regularization to the loss that is $\frac{||zi||^2_2}{\sigma^2}$

The sdf for a shape with the code $z$ can be estimated using Maximum-a-Posteriori given pairs ${X_i, s_i}$ of points in space and and values of sdf at those given points by finding the latent code minimizing the (regularized) loss. This allows to find shape codes given an arbitrary number of measurements (making it flexible to accomodate any number of depth map values as input for instance) !

This function can be minimized

Architecture

8 512-neurons fully connected layers
ReLU non-linearities
Final activation tanh

Experiments

Show important improvement on auto-encoding taksk upon AtlasNet Sphere and 25 patches in Chamfer mean, median and EMD, especially on classes such as Chair and Sofa.

Details

Pre-processing

Normalized each mesh to a unit sphere
Sampled 500,000 spatial points x’s, biased towards points neer the surface (Note that for a similar task Occupancy Networks, no improvement was observed from such a bias vs uniform sampling

Design choices

Found instabilities with batchnorm, and used weight-normalization instead
they clamp the signed distance explicitely (not just in the supervised model, they also clamp the predicted output)
Use a L1 cost
Size of latent vector : 256 or 128
Adding the xyz + latent vector at the fourth layer through concatenation significantly improved the results, they reduce the size of the output of the 4th layer accordingly (512 - (256 + 3)) in case of latent vector of size 256, this skip connection allows to train deeper networks (without, observe training loss plateau at 4 layers)
Observe that increased truncation distance indeed increase the Chamfer Distance of the reconstructions
Sampling of points are 1/2 positive 1/2 negative was crucial according to them