09 Training LSTM RNNs the Hybrid Learning Approach - PAI-yoonsung/lstm-paper GitHub Wiki

9 Training LSTM-RNNs - the Hybrid Learning Approach

In order to preserve the CEC in LSTM memory block cells, the original formulation of LSTM used a combination of two learning algorithms: BPTT to train network components located after cells, and RTRL to train network components located before and including cells.

LSTM 메모리 블록 셀 안의 CEC 를 보존하기 위해, LSTM 의 원본 공식은 두 가지 학습 알고리즘의 조합을 사용했었다: 셀들 뒤에 위치한 네트워크 구성 요소들을 학습시키기 위한 BPTT 와, 셀들 앞에 위치한 네트워크 구성 요소들과 셀들 자체를 학습시키기 위한 RTRL 이다.

The latter units work with RTRL because there are some partial derivatives (related to the state of the cell) that need to be computed during every step, no matter if a target value is given or not at that step.

후방 유닛들은 RTRL 로 작동하게 되는데, 그 이유는 (셀들의 상태와 관련된)편미분이 있기 때문이다. 이들은 현재 스탭에서 목적값이 주어지든 주어지지 않든 매 스탭마다 계산되어야한다.

For now, we only allow the gradient of the cell to be propagated through time, truncating the rest of the gradients for the other recurrent connections.

현재는, 오직 셀들의 경사만을 시간에 따라 전파되도록 허락하고, 나머지 다른 순환 연결에 대한 경사들은 잘라낸다.

We define discrete time steps in the form τ = 1, 2, 3, .... Each step has a forward pass and a backward pass; in the forward pass the output/activation of all units are calculated, whereas in the backward pass, the calculation of the error signals for all weights is performed.

우리는 이산 타임 스탭들을 τ = 1, 2, 3, .... 의 형태로 정의하고, 각 스탭은 순방향 통과와 역방향 통과를 갖는다; 순방향 통과에서는 모든 유닛들의 출력/활성화 가 계산되고, 반면에 역방향 통과에서는 모든 가중치들에 대한 에러 신호의 계산이 이루어진다.

9.1 The Forward Pass

Let M be the set of memory blocks. Let m_c be the c-th memory cell in the memory block m, and W[u,v] be a weight connecting unit u to unit v.

M 을 메모리 블록들의 집합이라 하자. 이때, m_c 는 메모리 블록 m 안의 c 번째 메모리 셀 이고, W[u,v] 는 유닛 u 와 유닛 v 를 연결하는 가중치가 된다.

In the original formulation of LSTM, each memory block m is associated with one input gate inm and one output gate out_m.

LSTM 의 원본 공식에서는, 각 메모리 블록 m 은 연관된다.

The internal state of a memory cell m_c at time τ + 1 is updated according to its state s_m_c (τ ) and according to the weighted input z_m_c (τ + 1) multiplied by the activation of the input gate y_in_m(τ + 1).

타임 τ + 1 에서 메모리 셀의 초기 상태 m_c 는 상태 s_m_c (τ )와 입력 게이트 y_in_m(τ + 1) 의 활성화에 의해 곱해진 가중치가 적용된 입력 z_m_c (τ + 1)를 따른다.

Then, we use the activation of the output gate z_out_m(τ + 1) to calculate the activation of the cell y_m_c (τ + 1).

그리고나면, 출력 게이트 z_out_m(τ + 1) 의 활성화를 사용하여 셀 y_m_c (τ + 1) 의 활성화를 계산하게 된다.

The activation y_in_m of the input gate in_m is computed as

입력 게이드 in_m 의 활성화 y_in_m 는 위의 24번 식과 같이 계산된다.

Figure 10: A standard LSTM memory block. The block contains (at least) one cell with a recurrent self-connection (CEC) and weight of ‘1’. The state of the cell is denoted as sc. Read and write access is regulated by the input gate, yin, and the output gate, yout. The internal cell state is calculated by multiplying the result of the squashed input, g, by the result of the input gate, yin, and then adding the state of the last time step, sc(t − 1). Finally, the cell output is calculated by multiplying the cell state, sc, by the activation of the output gate, y_out.

그림 10: 일반적인 LSTM 메모리 블록. 블록은 순환-자기연결 (CEC) 과 가중치 1 을 갖는 (최소) 하나의 셀을포함한다. 셀의 상태는 s_c 로 표기한다. Read, Write 접근은 입력 게이트 y_in 와 출력 게이트 y_out 에 의해 통제된다. 내부 셀 상태는 입력게이트 y_in 의 결과로 인해 짓눌린(?) 입력인 g 를 곱해주는 것과 지난 타임스탭의 상태 s_c(t-1) 을 더해주는 것으로 계산된다. 마지막으로, 셀의 출력은 출력 게이트 y_out 의 활성화에 의해 셀 상태 s_c 를 곱해주는 것으로 계산된다.

그림 11: 일반적인 LSTM 메모리 블록. 블록은 순환-자기연결 (CEC) 과 가중치 1 을 갖는 (최소) 하나의 셀을포함한다. 셀의 상태는 s_c 로 표기한다. Read, Write 접근은 입력 게이트 y_in 와 출력 게이트 y_out 에 의해 통제된다. 내부 셀 상태는 입력게이트 y_in 의 결과로 인해 짓눌린(?) 입력인 g(x) 를 곱해주는 것과 현재 타임스탭의 상태 s_m_c(t) 를 다음 s_m_c(t+1) 에 더해주는 것으로 계산된다. 마지막으로, 셀의 출력은 출력 게이트의 활성화에 의해 셀 상태를 곱해주는 것으로 계산된다.

dictionary

partial derivatives: 편미분
truncating: 자르기, 절단
regulated: 규제