Chapter 4, Training Models - wwbin2008/Handson_ml_demo-and-some-notes GitHub Wiki

  • regression, closed form solution using normal equation is slow(when training sample is large, O(n^3)), could use gradient descent.

  • batch/mini-batch/stochastic gradient descent

  • These learning curves are typical of an underfitting model. Both curves have reached a plateau; they are close and fairly high. If your model is underfitting the training data, adding more training examples will not help. You need to use a more complex model or come up with better features. (p.124)

    There is a gap between the curves. This means that the model performs significantly better on the training data than on the validation data, which is the hallmark of an overfitting model. However, if you used a much larger training set, the two curves would continue to get closer. (p.126)

  • A very different way to regularize iterative learning algorithms such as Gradient Descent is to stop training as soon as the validation error reaches a minimum. This is called early stopping. As the epochs go by, the algorithm learns and its prediction error (RMSE) on the training set naturally goes down, and so does its prediction error on the validation set. However, after a while the validation error stops decreasing and actually starts to go back up. This indicates that the model has started to overfit the training data. With early stopping you just stop training as soon as the validation error reaches the minimum. It is such a simple and efficient regularization technique that Geoffrey Hinton called it a “beautiful free lunch.”