Deep Convolutional Neural Net (CNN) Layer Types - BKJackson/BKJackson_Wiki GitHub Wiki

Convolutional Neural Net Common Layer Settings
Source: Lecture

Dropout

  • A type of regularization used to avoid overfitting
  • Each time we present a training example, we randomly omit each hidden unit with a probability of 0.5. So we are randomly sampling from 2^H different architectures. (All architectures share weights.)
  • Can be considered an extreme form of bagging.
  • Sharing weights means that every model is very strongly regularized.
  • A way of combining many neural network models without having to separately train a large number of models
  • Introduced by Nitish Shrivastav and Geoffrey Hinton in 2014
  • drop out ratio: 0 <= p <= 1
  • Can confine drop out neurons to a drop out layer (or layers)
  • At test time, we use all of the hidden units, but halve their outgoing weights. This exactly computes the geometric mean of the predictions of all 2^H models. Assuming softmax output.
  • Can use dropout on input layer too, but with a higher probability of keeping an input unit. (Used by "denoising autoencoders" developed by Pascal Vincent, Hugo Larochelle and Yoshua Bengio.)
  • The record-breaking object recognition net developed by Alex Krizhevsky uses dropout.
  • Any net that uses "early stopping" can do better by using dropout at the cost of taking quite a lot longer to train.
  • If your deep neural net is not overfitting, you should be using a bigger one and using dropout!
  • Another way to think about dropout (related to mixtures of exponents): If a hidden unit knows which other hidden units are present, it can co-adapt to them on the training data. But if a hidden unit has to work well with combinatorially many sets of co-workers, it is more likely to do something that is individually useful.

Resources

Batch Normalization

  • standardizes values to mean of 0 and variance of 1
  • tool to help with unstable or exploding gradients problem
  • normalize outputs of individual layers
  • insert a batch normalization layer between hidden layers
  • can also help network train more quickly
  • epochs take longer due to increased number of computations but convergence will be faster
  • no need to have a standardization layer after the input layer
  • apply before activation function (e.g., relu)

Resources:

Flatten Layer

  • used to convert data from a nxn image layer (where n > 1) to a 1-D vector for input into the next (dense) layer

Dense Layer

  • a fully-connected layer
  • take vectors as input and output a 3D tensor
  • add after flatten layer
  • can add 1 or more dense layers to create a shallow neural network at the end of a deep network to perform classification

CNN: Flatten and Dense Layers - Implemented in TensorFlow

Max Pooling

  • addresses location invariance
  • reduce data dimensions by combining outputs of neuron clusters in one layer into a single neuron in the next layer
  • max pooling takes the max value, average pooling takes the average value

Useful examples

Visualizing what convnets learn