Deep Convolutional Neural Net (CNN) Layer Types - BKJackson/BKJackson_Wiki GitHub Wiki

Convolutional Neural Net Common Layer Settings
Source: Lecture

Dropout

A type of regularization used to avoid overfitting
Each time we present a training example, we randomly omit each hidden unit with a probability of 0.5. So we are randomly sampling from 2^H different architectures. (All architectures share weights.)
Can be considered an extreme form of bagging.
Sharing weights means that every model is very strongly regularized.
A way of combining many neural network models without having to separately train a large number of models
Introduced by Nitish Shrivastav and Geoffrey Hinton in 2014
drop out ratio: 0 <= p <= 1
Can confine drop out neurons to a drop out layer (or layers)
At test time, we use all of the hidden units, but halve their outgoing weights. This exactly computes the geometric mean of the predictions of all 2^H models. Assuming softmax output.
Can use dropout on input layer too, but with a higher probability of keeping an input unit. (Used by "denoising autoencoders" developed by Pascal Vincent, Hugo Larochelle and Yoshua Bengio.)
The record-breaking object recognition net developed by Alex Krizhevsky uses dropout.
Any net that uses "early stopping" can do better by using dropout at the cost of taking quite a lot longer to train.
If your deep neural net is not overfitting, you should be using a bigger one and using dropout!
Another way to think about dropout (related to mixtures of exponents): If a hidden unit knows which other hidden units are present, it can co-adapt to them on the training data. But if a hidden unit has to work well with combinatorially many sets of co-workers, it is more likely to do something that is individually useful.

Resources

Dropout: an efficient way to combine neural nets - Video lecture by Geoffrey Hinton

Batch Normalization

standardizes values to mean of 0 and variance of 1
tool to help with unstable or exploding gradients problem
normalize outputs of individual layers
insert a batch normalization layer between hidden layers
can also help network train more quickly
epochs take longer due to increased number of computations but convergence will be faster
no need to have a standardization layer after the input layer
apply before activation function (e.g., relu)

Resources:

Batch normalization | What it is and how to implement it - Implemented in Keras

Flatten Layer

used to convert data from a nxn image layer (where n > 1) to a 1-D vector for input into the next (dense) layer

Dense Layer

a fully-connected layer
take vectors as input and output a 3D tensor
add after flatten layer
can add 1 or more dense layers to create a shallow neural network at the end of a deep network to perform classification

CNN: Flatten and Dense Layers - Implemented in TensorFlow

Max Pooling

addresses location invariance
reduce data dimensions by combining outputs of neuron clusters in one layer into a single neuron in the next layer
max pooling takes the max value, average pooling takes the average value

Useful examples

Visualizing what convnets learn