Deep_Learning - RicoJia/notes GitHub Wiki

========================================================================

Important Methods

========================================================================

When you have VERY DEEP NETWORKS, your gradients will grow exponentially large or small. initialized poorly, or learning rates are too high
- Initialization Alternatives: Xavier and He.
  - Xavier initialization: for sigmoid or tanh
  - He initialization: this preseves a variance of $2/n$ in its weights, which is impirically good for ReLu
- Gradient Clipping
- Normalization Layers: use batch normalization or layer normalization
  - layer normalization:
  - Batch Normalization: operates across channels

Source: Vanishing Gradient Is More Visible In Shallow Layers

Normalize data. Shift the mean, and even the variance?? If your input data comes from very different sources, doing min-max could be helpful.

========================================================================

======================================================================== Impl

Convolution Layer
Relu:
- Learns non-linearity in features
- Feature Scaling: ensuring negative values do not pass thru. Reducing multiplications