stochastic gradient descendant - taoualiw/My-Knowledge-Base GitHub Wiki
Instead of computing the loss using all the training data which is computationnaly very consuming, we compute an estimate of it by computing an average loss for a small random sample of data at each step (1..1000).
Inputs : mean = 0, equal variance (small)
Intial Weights : Random, mean =0, equal variance (small)
Momentum technique : running average
Stochastic Gradient Descent is sensitive to feature scaling, so it is highly recommended to scale your data. For example, scale each attribute on the input vector X to [0,1] or [-1,+1], or standardize it to have mean 0 and variance 1. Note that the same scaling must be applied to the test vector to obtain meaningful results.