SELU(scaled exponential linear units) - beyondnlp/nlp GitHub Wiki

written c

double selu_alpha = 1.6732632423543772848170429916717;
double selu_lamb = 1.0507009873554804934193349852946;
double selu( const double x )
{
     double v = x;
    if( v < 0 )
    {
        v = selu_alpha * ( exp( v ) - 1 );
    }
    return v * selu_lamb;
}

tensorflow

def applyselu( tensor ):
    relu = tf.nn.relu( tensor ) # >0: x, <0: 0
    neg_relu = tf.subtract( tensor, relu )  # >0: 0, <0: x
    selu_neg = tf.subtract( tf.multiply( alpha, tf.exp( neg_relu ) ), alpha )   # 0 goes to 0
    return tf.multiply( lamb, tf.add( relu, selu_neg ) )

간략 설명
- relu = tf.nn.relu( tensor )
  - 입력으로 들어온 tensor을 RELU를 거친다. RELU를 거치게 되면 X가 0이상인 것은 원래 값으로 0이하인 것은 모두 0이 리턴된다.
  - 즉, X가 0 이상인 값만을 취하고 0이하인 값을 버리게 된다.
- neg_relu = tf.subtract( tensor, relu )
  - tensor에서 relu를 뺀다. X 가 0 이상인 것은 tensor와 relu가 동일하기 때문에 모두 없어지고
  - tensor에서 0 이하인 것만 남게된다( relu는 0 이하는 0이기 때문에 빼도 의미가 없다 )
- selu_neg = tf.subtract( tf.multiply( alpha, tf.exp( neg_relu ) ), alpha )
  - tf.exp( neg_relu )
    - X가 0 이하인 것만 exp함수를 호출하고 이름을 A
  - A(나온 값)을 alpha와 곱하여 B
  - B에서 - alpha를 빼서 selu_neg를 만든다.
  - SELU는 X가 0 이하일때 RELU와 다르게 0이 아닌 값을 나오게 만든다.
  - 그래서 neg_relu만 exp로 증폭시키고 alpha를 곱하고 다시 alpha를 빼는 방식을 취한다.
- tf.multiply( lamb, tf.add( relu, selu_neg ) )
  - 마지막으로 relu에 selu_neg를 더한다.( relu는 0 이상인 값, selu는 0 이하의 값을 가지고 있다. )
another implementation

def selu(x):
    with ops.name_scope('elu') as scope:
        alpha = 1.6732632423543772848170429916717
        scale = 1.0507009873554804934193349852946
        return scale*tf.where(x>=0.0, x, alpha*tf.nn.elu(x))

x 0이상이면 그대로 x
x 0이하이면 alpha*tf.nn.elu(x)
tensorflow selu + dropout

def applyseludropout( tensor, keep_prob ):
    # Create mask.
    mask = tf.cast( tf.cast( tf.random_uniform( tf.shape( tensor ), 1 - keep_prob, 2 - keep_prob ), tf.int32 ), tf.float32 ) # 0: Remain, 1 : replace value.

    # apply mask.
    remain_val = tf.multiply( tf.subtract( 1.0, mask ), tensor )
    update_val = tf.multiply( alpha_dropout_val, mask )
    mask_applied  = tf.add( remain_val, update_val )

    # Affine Trans params.
    prod = keep_prob + np.power( alpha_dropout_val, 2.0 ) * keep_prob  * ( 1.0 - keep_prob )
    a = np.power( prod, -0.5 )
    b = -a * alpha_dropout_val * ( 1.0 - keep_prob )

    # Apply affine transformation.
    return tf.add( tf.multiply( a, mask_applied ), b )

tensorflow elu

def elu(x, alpha=1.):
    """Exponential linear unit.

    # Arguments
        x: A tensor or variable to compute the activation function for.
        alpha: A scalar, slope of negative section.

    # Returns
        A tensor.
    """
    res = tf.nn.elu(x)
    if alpha == 1:
        return res
    else:
        return tf.where(x > 0, res, alpha * res)