Logarithmic Loss - aragorn/home GitHub Wiki

  • https://www.kaggle.com/wiki/LogarithmicLoss

    • The logarithm of the likelihood function for a Bernoulli random distribution.
    • λ² λ₯΄λˆ„이 μž„μ˜ 뢄포에 λŒ€ν•œ μš°λ„ ν•¨μˆ˜μ˜ 둜그
    • In plain English, this error metric is used where contestants have to predict that something is true or false with a probability (likelihood) ranging from definitely true (1) to equally true (0.5) to definitely false(0).
    • μ–΄λ–€ 값이 참인지 λ˜λŠ” 거짓인지 μ˜ˆμΈ‘ν•˜λ©΄μ„œ, ν™•μ‹€νžˆ 참인지, μ°Έ λ˜λŠ” 거짓일 ν™•λ₯ μ΄ λ™μΌν•œμ§€, λ˜λŠ” ν™•μ‹€νžˆ 거짓인지 ν™•λ₯ κ°’μœΌλ‘œ μ˜ˆμΈ‘ν•  λ•Œ, 이 였λ₯˜ μΈ‘μ • 기쀀을 μ‚¬μš©ν•œλ‹€.
    • 였λ₯˜μ— λŒ€ν•΄ 둜그λ₯Ό μ μš©ν•˜λŠ” 것은 ν™•μ‹ ν•˜κ±°λ‚˜ ν‹€λ¦° 경우 λͺ¨λ‘μ— λŒ€ν•΄ κ°•ν•˜κ²Œ μ²˜λ²Œν•˜λŠ” νš¨κ³Όκ°€ μžˆλ‹€. μ΅œμ•…μ˜ 경우, 무엇인가 참인 κ²ƒμœΌλ‘œ ν™•μ‹ ν•˜λŠ” μ˜ˆμΈ‘μ„ ν–ˆμœΌλ‚˜ μ‹€μ œλ‘œλŠ” 그것이 거짓인 경우, 였λ₯˜ μ μˆ˜μ— λ¬΄ν•œλŒ€μ˜ 값을 μΆ”κ°€ν•˜κ²Œ 되며, λ‹€λ₯Έ μ˜ˆμΈ‘κ°’μ„ λͺ¨λ‘ λ¬΄μ˜λ―Έν•˜κ²Œ λ§Œλ“ λ‹€.
  • General log loss
    general logloss
    where N is the number of examples, M is the number of classes, and y_ij is a binary variable indicating whether class j was correct for example i.

  • When the number of classes is 2 (M=2)
    logloss when M=2

  • https://www.quora.com/What-is-an-intuitive-explanation-for-the-log-loss-function

The log loss function is simply the objective function to minimize, in order to fit a log linear probability model to a set of binary labeled examples. Recall that a log linear model assumes that the log-odds of the conditional probability of the target given the features is a weighted linear combination of features . These weights are the parameters of the model which we'd like to learn.


Convexity

MSE(Mean Squared Error)λŠ” logistic function 의 weight 에 λŒ€ν•΄ convex κ°€ μ•„λ‹ˆλ‹€.

  • Cost function for logistic regression
    • If we use this function for logistic regression this is a non-convex function for parameter optimization.

...

  • Why do we chose this function when other cost functions exist?
    • This cost function can be derived from statistics using the principle of maximum likelihood estimation.
      • Note this does mean there's an underlying Gaussian assumption relating to the distribution of features.
    • Also has the nice property that it's convex.