Hinge Loss - SoojungHong/MachineLearning GitHub Wiki

(Reference : https://en.wikipedia.org/wiki/Hinge_loss)

In machine learning, the hinge loss is a loss function used for training classifiers. The hinge loss is used for "maximum-margin" classification, most notably for support vector machines (SVMs).[1] For an intended output t = ±1 and a classifier score y, the hinge loss of the prediction y is defined as

ℓ ( y ) = max ( 0 , 1 − t ⋅ y ) {\displaystyle \ell (y)=\max(0,1-t\cdot y)} \ell(y) = \max(0, 1-t \cdot y)

Note that y should be the "raw" output of the classifier's decision function, not the predicted class label. For instance, in linear SVMs, y = w ⋅ x + b {\displaystyle y=\mathbf {w} \cdot \mathbf {x} +b} y = \mathbf{w} \cdot \mathbf{x} + b, where ( w , b ) {\displaystyle (\mathbf {w} ,b)} (\mathbf{w},b) are the parameters of the hyperplane and x {\displaystyle \mathbf {x} } \mathbf {x} is the point to classify.

It can be seen that when t and y have the same sign (meaning y predicts the right class) and | y | ≥ 1 {\displaystyle |y|\geq 1} |y| \ge 1, the hinge loss ℓ ( y ) = 0 {\displaystyle \ell (y)=0} \ell(y) = 0, but when they have opposite sign, ℓ ( y ) {\displaystyle \ell (y)} \ell(y) increases linearly with y (one-sided error).