stochastic gradient descendant - taoualiw/My-Knowledge-Base GitHub Wiki

Stochastic Gradient Descent (SGD)

Instead of computing the loss using all the training data which is computationnaly very consuming, we compute an estimate of it by computing an average loss for a small random sample of data at each step (1..1000).

Inputs : mean = 0, equal variance (small)

Intial Weights : Random, mean =0, equal variance (small)

Momentum technique : running average

Stochastic Gradient Descent is sensitive to feature scaling, so it is highly recommended to scale your data. For example, scale each attribute on the input vector X to [0,1] or [-1,+1], or standardize it to have mean 0 and variance 1. Note that the same scaling must be applied to the test vector to obtain meaningful results.

References:

⚠️ **GitHub.com Fallback** ⚠️