Deformable Convolutional Networks - Deepest-Project/Greedy-Survey GitHub Wiki

Resources

Abstract

๊ธฐ์กด CNN ๊ตฌ์กฐ์˜ ํ•œ๊ณ„:

Convolutional neural networks (CNNs) are inherently limited to model geometric transformations due to the fixed geometric structures in their building modules.

๋…ผ๋ฌธ์—์„œ ์ œ์‹œํ•˜๋Š” ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•:

In this work, we introduce two new modules ..., deformable convolution and deformable RoI pooling.

Introduction

๊ธฐ์กด ๋ชจ๋ธ์˜ ๋‘ ๊ฐ€์ง€ ๋‹จ์ :

First, the geometric transformations are assumed fixed and known. Such prior knowledge is used to augment data, and design the features and algorithms.

Second, handcrafted design of invariant features and algorithms could be difficult or infeasible for overly complex transformations, even when they are known.

์ด๋Ÿฌํ•œ ๋‹จ์ ์˜ ์›์ธ:

... a convolution unit samples the input feature map at fixed locations; a pooling palyer reduces the spatial resolution at a fixed ratio; a RoI pooling layer separates a RoI into fixed spatial bins, etc.

... all approaches still rely on the primitive bounding box based feature extraction.

Deformable Convolution:

It adds 2D offsets to the regular grid sampling locations in the standard convolution.

The offsets are learned from the preceding feature maps, via additional convolutional layers.

Deformable RoI pooling:

It adds an offset to each bin position in the regular bin partition of the previous RoI pooling. Similarly, the offsets are learned from the preceding feature maps and RoIs, ...

Detail

Deformable Convolutions

A 3 by 3 kernel vanilla convolution can be written as follows:


where R defines the 3 by 3 grid, w is the filter weight matrix, and x is the input feature map.

Then we add an extra offset parameter to change this to deformable convolution:

This offset parameter is learned through a second convolutional network. The N on the following figure is 3 * 3 = 9 for the 3 by 3 deformable convolution case.

Deformable RoI Pooling

As in the case of deformable convolutions, from the following vallina RoI pooling,

we add an extra offset parameter to change this to deformable RoI pooling:

This offset parameter is learned through a fully connected layer.

Deformable Position-Sensitive RoI Pooling

We obtain PS-RoI pooling from the RoI pooling equation by changing the x to x_{i,j}.
In this case, the offset parameters are learned from a second convolutional network, following the spirit of Fast R-CNN.

Deformable Convolutional Networks

Both deformable convolutional layers and deformable RoI pooling layers have the same input and output dimensions as their vanilla versions. Thus, they can readily replace their vanilla counterparts. The resulting CNNs are called deformable ConvNets. The main effect of adding these deformable layers is that the receptive field is adaptively adjusted according to the objects' scale and shape.

Experiments and Results

์ด ๋ถ€๋ถ„์€ ๋…ผ๋ฌธ์œผ๋กœ ๋Œ€์ฒดํ•ฉ๋‹ˆ๋‹ค.

Discussion

Parameter ๊ฐœ์ˆ˜๊ฐ€ ๋งŽ์ด ๋Š˜์ง€๋Š” ์•Š์•„ ๋ชจ๋ธ์˜ ์ธก๋ฉด์—์„œ๋Š” lightweight์ด์ง€๋งŒ, ๊ตฌํ˜„์ด ์ƒ๋‹นํžˆ ๊ท€์ฐฎ์€๋ฐ ๋น„ํ•ด ์„ฑ๋Šฅ ํ–ฅ์ƒ์€ ๊ทธ๋ฆฌ ๋†’์ง€๋Š” ์•Š์€ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. Default convolution grid + offset์œผ๋กœ ์ด์›ํ™”ํ•˜์—ฌ ๊ฐ๊ฐ์„ convolution layer๋กœ ๋‚˜๋ˆ  ํ•™์Šตํ•˜๊ธฐ ๋ณด๋‹ค๋Š” ์ด ๋‘˜์„ ํ•ฉ์ณ์„œ ํ•˜๋‚˜๋กœ ๋งŒ๋“ค์–ด๋ณด๊ณ  ์‹ถ์Šต๋‹ˆ๋‹ค.