loss l2 blurring - hassony2/inria-research-wiki GitHub Wiki

L2 loss and blurring effect

Using a L2 loss (also known as mean square loss) is attractive as it is easy to compute and differentiable.

Nevertheless, the L2 loss fails to capture multimodal distributions.

If in the dataset the values for a given input is either 0 or 1 with equal probability (0 and 1 are then two possible modes of that match the input), the predicted value that will minimize the L2 loss over the dataset will be 0.5, which is the average of the two possible values, although 0.5 might is not a valid mode given the input.

If the 0 and 1 modes are not represented with equal probability, the minimization mode will be a weighted average of 0 and 1 with the weights matching the 0 and 1 frequencies.

For instance, if the task is to predict the next frame for a moving car, depending on the speed, different output frames are valid candidates. The L2 norm minimization will lead to an output image that is the mean of all those possible output frames, hence leading to a blurry image.