loss functions - hassony2/inria-research-wiki GitHub Wiki

Loss functions

Soft max

Differentiable version of argmax

Formula : e^(x_i)/sum_{j}{e^{x_j}}

If one of the x_j is much higher then the others, the values will be close to 0 for all except the j index, for which the value will be close to one.

Cross Entropy

See this friendly post which starts from information entropy to explain cross entropy as a choice of loss, and shows that minimizing the cross entropy is equivalent to minimizing the negative log likelyhood

Entropy

H(y) = \sum_i y_i * log(1/y_i) = -y_i log(y_i)

Interpretation : mean size of bit encoding under distribution y

Cross-Entropy

ground truth distribution: y, estimated distribution y_hat

C(y, y_hat) = - \sum_i y_i log(y_hat_i)

Interpretation : mean bit size of encoding of y under wrong distribution y

KL

KL(y, y_hat) = cross_entropy(y, y_hat) - entropy(y)

Minimizing KL is the same as minimizing cross_entropy

It is the same as maximizing the likelyhood (or minimizing the negative log likelyhood)

\Product_{(x, y)} P(x,y | theta)