Resnets - abhigarg/DeepLearningNotes GitHub Wiki

Residual Networks (Resnets)

These solves the problem of degradation of training error for deep neural networks by providing shortcuts. In normal neural network, the activation in a convolution layer is defined as y = f(x) but in Resnets the same is defined as y = f(x) + x.

Wide Resnets

Wide Resnets has more channels in each layer to facilititate reuse of residual or dimnishing features. If there are few features used down the network then it will not be able to learn useful representations well. But increasing the width too much may cause learning noise.

GPUs are more effective on wider networks and hence takes less time to train even though the number of parameters and floating point operations increases with width.

ResNeXt

A balance between depth and width has to be created for deep residual networks to train properly.

Deeper networks have been proven to be much more effective than shallow once if trained properly.