L1 and L2 Regularization - SoojungHong/MachineLearning GitHub Wiki

Regularization

Regularization basically adds the penalty as model complexity increases. Regularization parameter (lambda) penalizes all the parameters except intercept so that model generalizes the data and won’t overfit.

J(theta) is cost function

In above function shows, as the complexity is increasing, regularization will add the penalty for higher terms. This will decrease the importance given to higher terms and will bring the model towards less complex equation.

L1 regularization technique is called Lasso Regression

“absolute value of magnitude” of coefficient as penalty term to the loss function.

L2 regularization technique is called Ridge Regression

Key difference

The key difference between these techniques is that Lasso shrinks the less important feature’s coefficient to zero thus, removing some feature altogether. So, this works well for feature selection in case we have a huge number of features. Ridge regression adds “squared magnitude” of coefficient as penalty term to the loss function.