Deep Convolutional Neural Net (CNN) Layer Types - BKJackson/BKJackson_Wiki GitHub Wiki

Source: Lecture
Dropout
- A type of regularization used to avoid overfitting
- Each time we present a training example, we randomly omit each hidden unit with a probability of 0.5. So we are randomly sampling from 2^H different architectures. (All architectures share weights.)
- Can be considered an extreme form of bagging.
- Sharing weights means that every model is very strongly regularized.
- A way of combining many neural network models without having to separately train a large number of models
- Introduced by Nitish Shrivastav and Geoffrey Hinton in 2014
- drop out ratio: 0 <= p <= 1
- Can confine drop out neurons to a drop out layer (or layers)
- At test time, we use all of the hidden units, but halve their outgoing weights. This exactly computes the geometric mean of the predictions of all 2^H models. Assuming softmax output.
- Can use dropout on input layer too, but with a higher probability of keeping an input unit. (Used by "denoising autoencoders" developed by Pascal Vincent, Hugo Larochelle and Yoshua Bengio.)
- The record-breaking object recognition net developed by Alex Krizhevsky uses dropout.
- Any net that uses "early stopping" can do better by using dropout at the cost of taking quite a lot longer to train.
- If your deep neural net is not overfitting, you should be using a bigger one and using dropout!
- Another way to think about dropout (related to mixtures of exponents): If a hidden unit knows which other hidden units are present, it can co-adapt to them on the training data. But if a hidden unit has to work well with combinatorially many sets of co-workers, it is more likely to do something that is individually useful.
Resources
- Dropout: an efficient way to combine neural nets - Video lecture by Geoffrey Hinton
Batch Normalization
- standardizes values to mean of 0 and variance of 1
- tool to help with unstable or exploding gradients problem
- normalize outputs of individual layers
- insert a batch normalization layer between hidden layers
- can also help network train more quickly
- epochs take longer due to increased number of computations but convergence will be faster
- no need to have a standardization layer after the input layer
- apply before activation function (e.g., relu)
Resources:
- Batch normalization | What it is and how to implement it - Implemented in Keras
Flatten Layer
- used to convert data from a nxn image layer (where n > 1) to a 1-D vector for input into the next (dense) layer
Dense Layer
- a fully-connected layer
- take vectors as input and output a 3D tensor
- add after flatten layer
- can add 1 or more dense layers to create a shallow neural network at the end of a deep network to perform classification
CNN: Flatten and Dense Layers - Implemented in TensorFlow
Max Pooling
- addresses location invariance
- reduce data dimensions by combining outputs of neuron clusters in one layer into a single neuron in the next layer
- max pooling takes the max value, average pooling takes the average value