1905.03304 - hassony2/inria-research-wiki GitHub Wiki

Yue Wang, Justin M. Solomon

read 06/28/2019

Objective

Leverage deep encoding of shape to perform global matching between point clouds

ICP gives different results depending on initialization and can get into local minima

If exact assignments exist between point clouds, the optimal solution is given by SVD

embed the input point clouds with permutation/rigid-invariant embeddings (compare PointNet and DGCNN for this step)
use it to find soft matching points using a pointer network
use differentiable SVD to predict the rigid transformation
train and test the model end-to-end
show outperforms ICP and PointNetLK, and generalization to unseen data

For matching, we want one feature per point in the cloud --> use the representation generated before the last aggregation function
- Compare PointNet (input points are embedded individually, and then max operation is applied (order-independent implementation) + some additional transforms on top (e.g. a neural network)
- DGCNN: works on graphs of local neighborhoods that are created on the fly using k-nearest neighbors
Attention model that maps the two embeddings to new embeddings
- the embedding is expressed symmetrically, as a residual function phi of the two point cloud embeddings and the new embeddings are phi_x = F_x + phi(F_x, F_y) and phi_y = F_y + phi(F_y, F_x) (Note that phi depends on the input order)
Deep Closest Point v1 simply uses phi_x = F_x and phi_y = F_y (so no attention is used in this case)

The matching point for the point x_i is generated as the Ysoftmax(phi_y phi_xi^T) where Y is the matrix containing the coordinates of point Y, phi_xi is the embedding of point x_i and phi_y is the embedding of the point cloud Y. This effectively creates a matching point y_i as a weighted averaged point from Y where the averaging weights are determined by the embeddings of the point cloud Y and the point x_i for which the match is computed

Once the matches are computed, the SVD closed form produces the optimal rigid transform between the point clouds
They use the PyTorch numerical gradients of SVD to backpropagate through this module as well

Rotation loss: ||rotation_pred*rotation_gt - I3||^2
translation loss: ||t_pred - t_gt||^2
regularization: lambda * ||theta||^2 where theta are the parameters of the network

work in the case where very large rotations and translations are observed --> worse case for Vanilla ICP, which fails dramatically, note that FGR works quiet well for unseen categories, and that Go-ICP works quiet well under gaussian noise, however, both methods are outperformed by DCP
comparing to other methods
Using attention has minor impact on evaluating translation but major impact on rotation quality
Show major impact of using an explicit SVD versus using a MLP in their experiments --> works significantly better (but how much did they explore the MLP architecture and the parameter space ? Fair comparison ?)