[Word2Vec] Neural Language Model and Word2Vec - dsindex/blog GitHub Wiki

  1. Word2Vec
# softmax, loglikelihood에서 주로 사용되는 수식

y = exp(x) / sum( exp(x') )

log( y ) = log( exp(x) ) - log( sum( exp(x') ) )
         = x - log( sum( exp(x') ) ) 
         = -E

# nagative sampling에서 주로 사용되는 수식

sig(x) = 1 /  (1+exp(-x) ) 
       = 1 / ( 1 + ( 1/exp(x) ) ) 
       = 1 / ( exp(x) + 1 ) / exp(x) 
       = exp(x) / ( 1+exp(x) )

1 - sig(x) = 1 - ( 1 / ( 1+exp(-x) ) ) 
           = ( ( 1 + exp(-x) ) - 1 ) / ( 1+exp(-x) )
           = exp(-x) / ( 1+exp(-x) ) 
           = ( 1 / exp(x) ) / ( 1 + ( 1/exp(x) ) )
           = ( 1 / exp(x) ) / ( exp(x) + 1 ) / exp(x) 
           = 1 / ( 1 + exp(x) )

sig(x)' = exp(x) / ( exp(x) + 1 )^2
        = ( exp(x) / ( 1+exp(x) ) ) * ( 1 / ( 1+exp(x) ) )
        = sig(x) * ( 1 - sig(x) )


log( sig(x) ) = log( 1 / (1+exp(-x) )

이걸 미분하면, 

( 1 / sig(x) ) * sig'(x) = ( 1 / sig(x) ) * sig(x) * ( 1 - sig(x) ) 
                         = 1 - sig(x)

즉, log( sig(x) )' = 1 - sig(x)


log( sig(-x) ) 

이걸 미분하면

( 1 / sig(-x) ) * sig'(-x) * (-1) 
   = ( 1 / sig(-x) ) * sig(-x) * ( 1 - sig(-x) ) * (-1) 
   = - 1 + sig(-x) 
   = -( 1 - ( 1 / ( 1+exp(x) ) ) ) 
   = -( ( 1+exp(x) ) - ( 1  / ( 1+exp(x) ) 
   = -( exp(x) / ( 1+exp(x) ) )
   = -sig(x)

즉,  log( sig(-x) )' = -sig(x)


E = -log( sig(x) ) - log( sig(-x) )

이걸 미분하면

E' = -( 1 - sig(x) ) + sig(x) = -1 + 2*sig(x)

( exp(x) - 1 ) / ( exp(x) + 1 ) 
    = exp(x) / ( exp(x) + 1 ) - ( 1 / ( exp(x) + 1 ) )
    = sig(x) - ( 1 - sig(x) ) = -1 + 2*sig(x)

  1. Gensim
  1. Related Topics