Deep_Learning - RicoJia/notes GitHub Wiki

========================================================================

Important Methods

========================================================================

Exploding Derivatives && Vanishing Gradients

  • When you have VERY DEEP NETWORKS, your gradients will grow exponentially large or small. initialized poorly, or learning rates are too high
    • Initialization Alternatives: Xavier and He.
      • Xavier initialization: for sigmoid or tanh
      • He initialization: this preseves a variance of $2/n$ in its weights, which is impirically good for ReLu
    • Gradient Clipping
    • Normalization Layers: use batch normalization or layer normalization
      • layer normalization:
      • Batch Normalization: operates across channels

Source: Vanishing Gradient Is More Visible In Shallow Layers
  • A really helpful way to debug a network is to implement gradient check. ONLY TO CHECK!!
    1. Do a forward and backward pass. Get gradient $d = \frac{\partial J}{\partial w}$, and total cost $J$
    2. perturb 1 parameter $W^L_i$ by $\epsilon$. forward pass, get the cost J'.
    3. Calculate gradient: $d' = (J-J')/ \epsilon$. $$ \frac{|d-d'|}{|d| + |d'|} $$
      • If gradient is $ < 10^{-7}$, great! If smaller than $10^{-3}$, bad
    4. Note:
      • Run again after some training, because initialization might randomly yield good results there
      • Need to add regularization terms here
      • Doesn't work with Drop out

Training Set Up

  • Normalize data. Shift the mean, and even the variance?? If your input data comes from very different sources, doing min-max could be helpful.

========================================================================

Alex Net

======================================================================== Impl

Architecture

  1. Convolution Layer
  2. Relu:
    • Learns non-linearity in features
    • Feature Scaling: ensuring negative values do not pass thru. Reducing multiplications
⚠️ **GitHub.com Fallback** ⚠️