11. Gradient Descent - ZYL-Harry/Machine_Learning_study GitHub Wiki
- Problem:
if the dataset is too large, it is too expensive to take the normal gradient descent like what we have done before, so here are some advanced ones
various gradient descent
Batch gradient descent(the original one)
- using all the examples to calculate the gradient descent every time(in each iteration)
Stochastic gradient descent
- randomly using one example and go to the optimum point/region with circuitous path, but much faster than the batch gradient descent
Tip: the iteration time of the outer loop depends on the size of the dataset and is about 1-10
Mini-Batch gradient descent
- using b(mini-batch size) examples in each iteration
- contrast to Stochastic gradient descent:
1.Vectorization: Mini-batch gradient descent is likely to outperform Stochastic gradient descent only if there is a good vectorized implementation---having parallelizing gradient computation in b examples
2.need time to determine the parameter b in mini-batch gradient descent
check for convergency
Tip:
Map-reduce and data parallelism
- Example: