1612.01925 - hassony2/inria-research-wiki GitHub Wiki
CVPR 2017
[arxiv 1612.01925] FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks [PDF] [code] [pytorch-code] [notes]
Eddy Ilg, Nikolaus Mayer, Tonmoy Saikia, Margret Keuper, Alexey Dosovitskiy, Thomas Brox
read 22/01/2018
CVPR 2017
Objective
Improve upon FlowNet to provide accurate estimation of optical flow
Performs on par with SOTA methods, while being fast (roughly 8 fps)
Trained on static background ?!
Synthesis
Different networks
FlowNetS(imple)
Encoder-decoder network which takes a stack of two rgb images (input channel of size 6=2*3) and produces flow estimates after downsampling and upsampling with skip connections (like in hourglass network)
FlowNetC
Explicitly expresses correlation between feature maps.
The correlation layer was introduced in the first version of FlowNet and compares feature map patches from the two images (this layer has therefore no trainable weights)
It does so by summing the scalar products of feature map values (each scalar product spans accross the channel dimension for a single spatial position) accross the matching locations in two square patches (one in each feature map). In practice the correlations are computed only accross a limited displacement d (patches close in the two images are compared) In practice, for each displacement, a correlation value is outputted, and arranged in the channel dimension which gives d^2 channels in the output which are concatenated with convolved features of one of the original images.
Two images are separately processed by convolutional layers before merging to one feature map using correlation layer
Stacking networks
Stacking of two networks
The idea is to refine the first prediction. Optionnaly, the following inputs are used for the second network :
- input second image is warped to the first using the estimate of the flow produced by the first network.
- error between first image and warped second images
Experiments
FlowNetC and FlowNetS comparison
FlowNetC outperforms FlowNetS on KITTI dataset
2 Stacked networks
Best results are obtained by fixing the weights of first network and training only the second one
Warping improves the results in all scenarios
Stacking more networks with identical weights did not improve the results, neither did finetuning such a stack.
However, stacking (potentially different) networks with different weights and training them one at the time (freeze early networks and train only the one on top) does improve the final results.
Mutlitple small networks (obtained by reducing the size of the channel dimension) outperform a single larger network. (FlowNet2-ss > FlowNet2-C | FlowNet2-S)