l1 norm vs l2 norm - SoojungHong/MachineLearning GitHub Wiki

(Very good reference : http://www.chioka.in/differences-between-l1-and-l2-as-loss-function-and-regularization/)

[1] Root Mean Square Error (RMSE)

A typical performance measure for regression problems is the Root Mean Square Error (RMSE). It gives an idea of how much error the system typically makes in its predictions, with a higher weight for large errors.

[2] Mean Absolute Error (MAE) Even though the RMSE is generally the preferred performance measure for regression tasks, in some contexts you may prefer to use another function. For example, suppose that there are many outlier districts. In that case, you may consider using the Mean Absolute Error

[3] norm (distance from prediction and target) Both the RMSE and the MAE are ways to measure the distance between two vectors: the vector of predictions and the vector of target values. Various distance measures, or norms, are possible:

Computing the root of a sum of squares (RMSE) corresponds to the Euclidian norm: it is the notion of distance you are familiar with. It is also called the ℓ2 norm, noted ∥ · ∥2 (or just ∥ · ∥).
Computing the sum of absolutes (MAE) corresponds to the ℓ1 norm, noted ∥ · ∥1. It is sometimes called the Manhattan norm because it measures the distance between two points in a city if you can only travel along orthogonal city blocks.
More generally, the ℓk norm of a vector v containing n elements is defined as . ℓ0 just gives the number of non-zero elements in the vector, and ℓ∞ gives the maximum absolute value in the vector.