Concepts of Fitness and Regularization - utkaln/machine-learning GitHub Wiki

Underfit Overfit
Predictions skips lot of training targets Prediction catches almost all of training targets
High Bias High Variance

Options to address Overfitting

  1. Collect more data, this will help to generalize
  2. Use fewer features with intuition of what are good features
  3. Regularization - Reduce size of selective parameters to reduce the effect of some of the features instead of choice 2 that suggests completely removing the features

Regularization

  • The general philosophy about regularization is to reduce the impact of all the parameters enough to smoothen out overfitting
  • This can be achieved by adding additional value to cost function, that is just about enough to smoothen the overfitting
  • Preferred option to address overfitting
  • reduce the value of some parameters (w[j]) to reduce the effect of some of the features
  • Usual practice is to not reduce value of b

Cost Function with Regularization - Linear Regression / Logistic Regression

  • The cost function is same as above with the only difference is that the function f is a sigmoid function in Logistic Regression

Gradient Descent with Regularization - Linear Regression / Logistic Regression

  • Repeat until converge

Partial derivatives required for Gradient Descent

  • Gradient calculation (partial derivatives) are almost identical between Linear and Logistic regression with only difference is how function f(x) is calculated

Regularization in Neural Network

Regularized Model: L1 = Dense(units=20, activation='relu', kernel_regularizer=L2(0.01))