1612.01925 - hassony2/inria-research-wiki GitHub Wiki

CVPR 2017

[arxiv 1612.01925] FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks [PDF] [code] [pytorch-code] [notes]

Eddy Ilg, Nikolaus Mayer, Tonmoy Saikia, Margret Keuper, Alexey Dosovitskiy, Thomas Brox

read 22/01/2018

CVPR 2017

Objective

Improve upon FlowNet to provide accurate estimation of optical flow

Performs on par with SOTA methods, while being fast (roughly 8 fps)

Trained on static background ?!

Synthesis

Different networks

FlowNetS(imple)

Encoder-decoder network which takes a stack of two rgb images (input channel of size 6=2*3) and produces flow estimates after downsampling and upsampling with skip connections (like in hourglass network)

FlowNetC

Explicitly expresses correlation between feature maps.

The correlation layer was introduced in the first version of FlowNet and compares feature map patches from the two images (this layer has therefore no trainable weights)

It does so by summing the scalar products of feature map values (each scalar product spans accross the channel dimension for a single spatial position) accross the matching locations in two square patches (one in each feature map). In practice the correlations are computed only accross a limited displacement d (patches close in the two images are compared) In practice, for each displacement, a correlation value is outputted, and arranged in the channel dimension which gives d^2 channels in the output which are concatenated with convolved features of one of the original images.

Two images are separately processed by convolutional layers before merging to one feature map using correlation layer

Stacking networks

Stacking of two networks

The idea is to refine the first prediction. Optionnaly, the following inputs are used for the second network :

input second image is warped to the first using the estimate of the flow produced by the first network.
error between first image and warped second images

Experiments

FlowNetC and FlowNetS comparison

FlowNetC outperforms FlowNetS on KITTI dataset

2 Stacked networks

Best results are obtained by fixing the weights of first network and training only the second one

Warping improves the results in all scenarios

Stacking more networks with identical weights did not improve the results, neither did finetuning such a stack.

However, stacking (potentially different) networks with different weights and training them one at the time (freeze early networks and train only the one on top) does improve the final results.

Mutlitple small networks (obtained by reducing the size of the channel dimension) outperform a single larger network. (FlowNet2-ss > FlowNet2-C | FlowNet2-S)