08 Long Short Term Neural Networks - PAI-yoonsung/lstm-paper GitHub Wiki

8 Long Short-Term Neural Networks

One solution that addresses the vanishing error problem is a gradient-based method called long short-term memory (LSTM) published by [41], [42], [22] and [23].

배니싱 에러 문제를 해결하기 위한 방법 중 하나는 long short-term memory (LSTM) 라고 불리우는 기울기-기반 메소드이다. 이는 [41], [42], [22], [23] 을 통해 공개되었다.

LSTM can learn how to bridge minimal time lags of more than 1,000 discrete time steps.

LSTM 은 1,000 번 이상의 별개의 타임 스탭의 최소 시간 렉을 어떻게 연결하는지 배울 수 있다. (?)

The solution uses constant error carousels (CECs), which enforce a constant error flow within special cells.

이 해결법은 constant error carousels (CECs) 을 사용하고, 이는 정오차가 특수한 셀들 사이를 흐르도록 한다. (?)

Access to the cells is handled by multiplicative gate units, which learn when to grant access.

셀들에게 접속하는 것은 접속 허용을 언제할 지를 학습하는 곱셉 게이트 유닛들에 의해 조절된다

8.1 Constant Error Carousel

Suppose that we have only one unit u with a single connection to itself.

우리가 자기 자신과의 단일 연결을 갖고있는 유닛 u 하나만 갖고 있다고 가정해보자.

The local error back flow of u at a single time-step τ follows from Equation 20 and is given by

u의 단일 타임스탭 τ에서의 지역 에러 역 흐름은 공식 20번에 의해 유도되고, 이는 다음과 같다.

From Equations 22 and 23 we see that, in order to ensure a constant error flow through u, we need to have

우리가 본 공식 22번, 23번으로부터, u를 통한 정오차 흐름을 확신하기위해 우리는 다음을 가져야 한다.

and by integration we have

또한 적분을 통해, 우리는 다음을 가지게 된다.

From this, we learn that f_u must be linear, and that u’s activation must remain constant over time; i.e.,

이를 바탕으로, 우리는 f_u 가 반드시 선형이어야 된다는 점과, u 의 액티베이션이 시간이 지나도 일정하게 남아야만 한다는 것을 배울 수 있다.

This is ensured by using the identity function f_u = id, and by setting W[u,u] = 1.0.

이는 항등 함수 f_u = id 를 사용하는 것과 W[u,u] = 1.0 를 설정하는 것으로 확보될 수 있다.

This preservation of error is called the constant error carousel (CEC), and it is the central feature of LSTM, where short-term memory storage is achieved for extended periods of time.

이러한 에러 예방을 constant error carousel (CEC) 라고 부르고, 이는 short-term 한 메모리 저장공간이 연장된 시간을 가질 수 있게해주는 LSTM 의 중심적인 특징이다.

Clearly, we still need to handle the connections from other units to the unit u, and this is where the different components of LSTM networks come into the picture.

분명히, 우리는 여전히 유닛 u로 향하는 다른 유닛들이 연결을 조절할 필요가 있지만, 이는 LSTM 네트워크의 다른 구성요소에서 처리할 것이다.

8.2 Memory blocks

In the absence of new inputs to the cell, we now know that the CEC’s backflow remains constant.

셀을 향하는 새로운 입력이 없을 때, 우리는 CEC의 역흐름이 지속적일 수 있다는 사실을 알았다.

However, as part of a neural network, the CEC is not only connected to itself, but also to other units in the neural network.

그러나, 신경망의 일부로써, CEC 는 자기 자신과 연결될 뿐 아니라, 신경망 내의 다른 유닛들과도 연결된다.

We need to take these additional weighted inputs and outputs into account.

이러한 추가적인 가중치가 적용된 입력과 출력을 고려해야 한다.

Incoming connections to neuron u can have conflicting weight update signals, because the same weight is used for storing and ignoring inputs.

유닛 u 로 들어오는 연결들은 충돌하는 가중치 갱신 신호들을 가질 수 있는데, 이는 같은 가중치는 저장과 입력들을 무시하기 위해 사용되기 때문이다.

For weighted output connections from neuron u, the same weights can be used to both retrieve u’s contents and prevent u’s output flow to other neurons in the network.

유닛 u 로부터의 가중치가 적용된 출력 연결들에 대해서, 같은 가중치들은 u 의 내용을 복구하거나 y의 출력을 신경망 내 다른 뉴런들로 흐르게 할 때 사용될 수 있다.

To address the problem of conflicting weight updates, LSTM extends the CEC with input and output gates connected to the network input layer and to other memory cells.

충돌하는 가중치 갱신 문제를 다루기 위해, LSTM 은 네트워크 입력 레이어와 다른 메모리 셀들에 연결된 입력, 출력 게이트들로 CEC 를 연장한다.

This results in a more complex LSTM unit, called a memory block; its standard architecture is shown in Figure 11.

이 결과로 만들어지는 조금 더 복잡한 LSTM 유닛을 메모리 블록이라고 부른다; 이것의 일반적인 구조는 Figure 11 에 나와있다.

The input gates, which are simple sigmoid threshold units with an activation function range of [0, 1], control the signals from the network to the memory cell by scaling them appropriately; when the gate is closed, activation is close to zero.

활성화 함수 범위 [0, 1] 을 갖는 간단한 시그모이드 임계 유닛인 입력 게이트는 신호를 네트워크에서 메모리셀로 가는 신호를 적절하게 스케일링해주는 것으로 컨트롤합니다; 게이트가 닫히면, 액티베이션은 0 에 근접하게 됩니다.

Additionally, these can learn to protect the contents stored in u from disturbance by irrelevant signals.

추가적으로, 이들은 u 에 저장된 내용들을 상관없는 신호의 방해로부터 지키는 것을 배울 수 있다.

The activation of a CEC by the input gate is defined as the cell state.

입력 게이드에 의한 CEC 의 활성화는 셀 상태에 의해 정의된다.

The output gates can learn how to control access to the memory cell contents, which protects other memory cells from disturbances originating from u.

출력 게이트들 메모리 셀 내용을 향한 접근을 어떻게 컨트롤하는지 배울 수 있다. 이를 통해, u 로부터 시작되는 방해들로부터 다른 메모리셀들을 지킨다.

So we can see that the basic function of multiplicative gate units is to either allow or deny access to constant error flow through the CEC.

즉, 곱셉 게이트 유닛들의 기본 기능은 CEC 를 통해 상수 에러 흐름에 대한 접근을 허락하거나 거부하는 것이다.

dictionary

discrete: 분리된, 구별된, 개별적인 enforce: 강제하다, 억지로 시키다, 시행하다 constant error: 정오차(?) take ~into account: ~를 고려하다 retrieve: 복구하다 disturbance: 방해 irrelevant: 상관없는