L1 and L2 Regularization - Nori12/Machine-Learning-Tutorial GitHub Wiki

Machine Learning Tutorial

Overview

A regression model that uses L1 regularization technique is called Lasso Regression and model which uses L2 regularization is called Ridge Regression.

The key difference between these two is the penalty term.

L1 Regularization

Lasso Regression adds “absolute value of magnitude” of coefficient as penalty term to the loss function.

lasso_cost_function

If lambda is zero then we will get back OLS whereas very large value will make coefficients zero hence it will under-fit.

L2 Regularization

Ridge regression adds “squared magnitude” of coefficient as penalty term to the loss function.

ridge_cost_function

Again, if lambda is zero then you can imagine we get back OLS. However, if lambda is very large then it will add too much weight and it will lead to under-fitting. Having said that it’s important how lambda is chosen. This technique works very well to avoid over-fitting issue.

The key difference between these techniques is that Lasso shrinks the less important feature’s coefficient to zero thus, removing some feature altogether. So, this works well for feature selection in case we have a huge number of features.