CROSS ENTROPY - beyondnlp/nlp GitHub Wiki

  • torch.exp()λ₯Ό μ‚¬μš©ν•˜λŠ” 이유
    • focal_loss에 보면 torch.exp()λ₯Ό μ‚¬μš©ν•˜λŠ” κ²½μš°κ°€ μžˆλ‹€. μ§€μˆ˜ν•¨μˆ˜λŠ” μ•„λž˜μ™€ 같이 λͺ¨λ“  y값이 0 이상이닀. λ”°λΌμ„œ μŒμˆ˜μ™€ μ–‘μˆ˜κ°€ μžˆλŠ” 값을 λͺ¨λ‘ μ–‘μˆ˜λ‘œ μΉ˜ν™˜ν• λ•Œ torch.exp()λ₯Ό μ μš©ν•΄ μ€€λ‹€.
    • https://velog.io/@heaseo/Focalloss-%EC%84%A4%EB%AA%85 image
import random
import torch
import torch.nn as nn
import numpy as np


answer=[1,5]
prob=[]
for i in range( 2 ):
    plist=[]
    for j in range(10):
        rand = random.random()
        plist.append( rand )
    prob.append(plist)


def cross_entropy1( prob, answer ):
    output = torch.Tensor( prob )
    target = torch.LongTensor(answer)
    criterion = nn.CrossEntropyLoss()
    loss = criterion(output, target)
    return loss



def cross_entropy2( prob, answer ):
    loss=0
    for i in range( len(answer) ):
        target = answer[i]
        output = prob[i]
        idx = target
        loss += np.log(sum(np.exp(output))) - output[idx]
    loss = loss/len(answer)
    return loss

def cross_entropy3(y, t):

    y = np.array(y)
    t = np.array(t)
    if y.ndim == 1:
        t = t.reshape(1, t.size)
        y = y.reshape(1, y.size)

    # ν›ˆλ ¨ 데이터가 원-ν•« 벑터라면 μ •λ‹΅ λ ˆμ΄λΈ”μ˜ 인덱슀둜 λ°˜ν™˜
    if t.size == y.size:
        t = t.argmax(axis=1)

    batch_size = y.shape[0]

  
    return -np.sum(np.log(y[np.arange(batch_size), t] + 1e-7)) / batch_size

loss1 = cross_entropy1( prob, answer );
loss2 = cross_entropy2( prob, answer );
loss3 = cross_entropy3( prob, answer );
print(f"loss1 : {loss1}")
print(f"loss2 : {loss2}")
print(f"loss3 : {loss2}")

μ •λ‹΅(Y)이 μ•„λž˜μ™€ κ°™κ³ 
[ 1 0 0 ], [ 0 0 1 ](/beyondnlp/nlp/wiki/-1-0-0-],-[-0-0-1-)
>>> Y=[1,0,0],[0,0,1](/beyondnlp/nlp/wiki/1,0,0],[0,0,1)

λͺ¨λΈ(Model)μ—μ„œ λ‚˜μ˜¨ 값이
[ 0.7 0.2 0.1 ],[0.2 0.3 0.5 ](/beyondnlp/nlp/wiki/-0.7-0.2-0.1-],[0.2-0.3-0.5-)처럼 λ‚˜μ˜€λ©΄
>> model=[0.7,0.2,0.1],[0.2,0.3.0.5]]
이 두 값을 μ΄μš©ν•˜μ—¬  CROSS ENTROPYλ₯Ό κ³„μ‚°ν• μˆ˜ μžˆλ‹€.

정닡에 λͺ¨λΈμ˜ 결과에 tf.log(model)μ·¨ν•œ 값을 κ³±ν•œλ‹€.
>>> log_model=tf.log(model)
>>> sess=tf.Session()
>>> sess.run(output)
array([[-0.35667497, -1.609438  , -2.3025851 ],
       [-1.609438  , -1.2039728 , -0.6931472 ]], dtype=float32)


Y * tf.log(Model)
μ΄μ œλŠ” 각 ν–‰λ³„λ‘œ λ”ν•˜λ©΄ λœλ‹€( reduce_sum(axis=1) )
>>> output=Y*log_model
>>> sess.run(output)
array([[-0.35667497, -0.        , -0.        ],
       [-0.        , -0.        , -0.6931472 ]], dtype=float32)

>>> a=tf.reduce_mean(output,1)
>>> sess.run(a)

array([-0.11889166, -0.23104906], dtype=float32)