1905.03304 - hassony2/inria-research-wiki GitHub Wiki
CVPR 2018
[arxiv 1905.03304] Deep Closest Point: Learning Representations for Point Cloud Registration [PDF] [code] [notes]
Yue Wang, Justin M. Solomon
read 06/28/2019
Objective
Leverage deep encoding of shape to perform global matching between point clouds
Motivation
ICP gives different results depending on initialization and can get into local minima
Notes on ICP
If exact assignments exist between point clouds, the optimal solution is given by SVD
Synthesis
Outline
- embed the input point clouds with permutation/rigid-invariant embeddings (compare PointNet and DGCNN for this step)
- use it to find soft matching points using a pointer network
- use differentiable SVD to predict the rigid transformation
- train and test the model end-to-end
- show outperforms ICP and PointNetLK, and generalization to unseen data
Method
Embedding
-
For matching, we want one feature per point in the cloud --> use the representation generated before the last aggregation function
- Compare PointNet (input points are embedded individually, and then max operation is applied (order-independent implementation) + some additional transforms on top (e.g. a neural network)
- DGCNN: works on graphs of local neighborhoods that are created on the fly using k-nearest neighbors
-
Attention model that maps the two embeddings to new embeddings
- the embedding is expressed symmetrically, as a residual function phi of the two point cloud embeddings and the new embeddings are phi_x = F_x + phi(F_x, F_y) and phi_y = F_y + phi(F_y, F_x) (Note that phi depends on the input order)
-
Deep Closest Point v1 simply uses phi_x = F_x and phi_y = F_y (so no attention is used in this case)
Matching
- The matching point for the point x_i is generated as the Ysoftmax(phi_y phi_xi^T) where Y is the matrix containing the coordinates of point Y, phi_xi is the embedding of point x_i and phi_y is the embedding of the point cloud Y. This effectively creates a matching point y_i as a weighted averaged point from Y where the averaging weights are determined by the embeddings of the point cloud Y and the point x_i for which the match is computed
Rigid transform computation
- Once the matches are computed, the SVD closed form produces the optimal rigid transform between the point clouds
- They use the PyTorch numerical gradients of SVD to backpropagate through this module as well
Loss
- Rotation loss: ||rotation_pred*rotation_gt - I3||^2
- translation loss: ||t_pred - t_gt||^2
- regularization: lambda * ||theta||^2 where theta are the parameters of the network
Training
- train on synthetic dataset (based on ModelNet40)
Details
- use layernorm
- ~arbitrary learning rate decay
Experiments
- work in the case where very large rotations and translations are observed --> worse case for Vanilla ICP, which fails dramatically, note that FGR works quiet well for unseen categories, and that Go-ICP works quiet well under gaussian noise, however, both methods are outperformed by DCP
- comparing to other methods
- Using attention has minor impact on evaluating translation but major impact on rotation quality
- Show major impact of using an explicit SVD versus using a MLP in their experiments --> works significantly better (but how much did they explore the MLP architecture and the parameter space ? Fair comparison ?)