05 Recurrent Neural Networks - PAI-yoonsung/lstm-paper GitHub Wiki

Recurrent neural networks (RNNs) [74, 75] are dynamic systems; they have an internal state at each time step of the classification.

Recurrent neural networks(RNNs)๋Š” ๋™์  ์‹œ์Šคํ…œ์œผ๋กœ, ๋ถ„๋ฅ˜์˜ ๊ฐ ํƒ€์ž„ ์Šคํƒญ๋งˆ๋‹ค ๋‚ด๋ถ€ ์ƒํƒœ๋ฅผ ๊ฐ–๊ณ ์žˆ๋‹ค.

This is due to circular connections between higher- and lower-layer neurons and optional self-feedback connections.

์ด๊ฒƒ์€ ๋” ๋†’๊ณ , ๋” ๋‚ฎ์€ ๋ ˆ์ด์–ด์˜ ๋‰ด๋Ÿฐ๋“ค ์‚ฌ์ด์˜ ์ˆœํ™˜ ์—ฐ๊ฒฐ๊ณผ ์„ ํƒ์  ์ž๊ธฐ์‘๋‹ต ์—ฐ๊ฒฐ ๋•Œ๋ฌธ์ด๋‹ค
์ˆœํ™˜ ์—ฐ๊ฒฐ
์ž๊ธฐ์‘๋‹ต ์—ฐ๊ฒฐ

These feedback connections enable RNNs to propagate data from earlier events to current processing steps.

์ด๋Ÿฌํ•œ ์‘๋‹ต ์—ฐ๊ฒฐ๋“ค์€ RNN ์ด ์ด์ „์˜ ์ด๋ฒคํŠธ๋กœ๋ถ€ํ„ฐ ํ˜„์žฌ์˜ ์ง„ํ–‰ ๋‹จ๊ณ„๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์ „๋‹ฌํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ด์ค€๋‹ค.

Thus, RNNs build a memory of time series events.

์ฆ‰, RNN์€ ์‹œ๊ณ„์—ด ์ด๋ฒคํŠธ์˜ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๋งŒ๋“ ๋‹ค.

5.1 Basic Architecture

RNNs range from partly to fully connected, and two simple RNNs are suggested by [46] and [16].

RNN ์˜ ์ผ๋ถ€ ์—ฐ๊ฒฐ์—์„œ fully connected ๊นŒ์ง€์˜ ๋ฒ”์œ„๋ฅผ ๊ฐ–๊ณ , ๋‘ ๊ฐ€์ง€ ๊ฐ„๋‹จํ•œ RNN์ด [46], [16] ์—์„œ ์ œ์•ˆ๋œ๋‹ค.

The Elman network is similar to a three-layer neural network, but additionally, the outputs of the hidden layer are saved in so-called โ€˜context cellsโ€™.

Elman ๋„คํŠธ์›Œํฌ๋Š” 3 ๋ ˆ์ด์–ด ์‹ ๊ฒฝ๋ง๊ณผ ๋น„์Šทํ•˜์ง€๋งŒ, ์ถ”๊ฐ€์ ์œผ๋กœ ํžˆ๋“  ๋ ˆ์ด์–ด์˜ ์ถœ๋ ฅ๋“ค์€ ๋ฌธ๋งฅ ์…€(context cells) ์ด๋ผ๊ณ  ๋ถˆ๋ฆฌ๋Š” ๊ณณ์— ์ €์žฅ๋œ๋‹ค.

The output of a context cell is circularly fed back to the hidden neuron along with the originating signal.

๋ฌธ๋งฅ ์…€์˜ ์ถœ๋ ฅ์€ ์ˆœํ™˜์ ์œผ๋กœ ํžˆ๋“  ๋‰ด๋Ÿฐ์—๊ฒŒ ์›๋ณธ ์‹ ํ˜ธ์™€ ํ•จ๊ป˜ ์‘๋‹ต์„ ์ค€๋‹ค.

Every hidden neuron has its own context cell and receives input both from the input layer and the context cells.

๊ฐ ํžˆ๋“  ๋‰ด๋Ÿฐ๋“ค์€ ๊ฐ์ž ๊ณ ์œ ์˜ ๋ฌธ๋งฅ์…€์„ ๊ฐ–๊ณ , ์ž…๋ ฅ ๋ ˆ์ด์–ด์™€ ์ปจํ…์ŠคํŠธ์…€๋“ค๋กœ๋ถ€ํ„ฐ ์ž…๋ ฅ์„ ๋ฐ›๋Š”๋‹ค.

Elman networks can be trained with standard error backpropagation, the output from the context cells being simply regarded as an additional input.

Elman ๋„คํŠธ์›Œํฌ๋Š” ์ผ๋ฐ˜์ ์ธ ์—๋Ÿฌ ์—ญ์ „ํŒŒ๋กœ ํ›ˆ๋ จ๋  ์ˆ˜ ์žˆ๊ณ , ๋ฌธ๋งฅ์…€๋กœ๋ถ€ํ„ฐ์˜ ์ถœ๋ ฅ์€ ๋‹จ์ˆœํžˆ ์ถ”๊ฐ€์ ์ธ ์ž…๋ ฅ์œผ๋กœ์จ ์ธ์‹๋œ๋‹ค.

Figures 5 and 6 show a standard feed-forward network in comparison with such an Elman network.

๊ทธ๋ฆผ 5, 6์€ Elman ๋„คํŠธ์›Œํฌ์™€ ์ผ๋ฐ˜์ ์ธ ์ˆœ์ „ํŒŒ ๋„คํŠธ์›Œํฌ์˜ ๋น„๊ต๋ฅผ ๋ณด์—ฌ์ค€๋‹ค.

Figure 7: This figure shows a partially recurrent neural network with selffeedback in the hidden layer.

์ด ๊ทธ๋ฆผ์€ ํžˆ๋“  ๋ ˆ์ด์–ด ์•ˆ์— ๋ถ€๋ถ„ ์ˆœํ™˜์‹ ๊ฒฝ๋ง์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

Jordan networks have a similar structure to Elman networks, but the context cells are instead fed by the output layer.

Jordan ๋„คํŠธ์›Œํฌ๋Š” Elman ๋„คํŠธ์›Œํฌ์™€ ๋น„์Šทํ•œ ๊ตฌ์กฐ๋ฅผ ๊ฐ–์ง€๋งŒ, ๋ฌธ๋งฅ์…€์ด ์ถœ๋ ฅ ๋ ˆ์ด์–ด์— ์˜ํ•ด ์‘๋‹ต์„ ๋ฐ›์Šต๋‹ˆ๋‹ค.

A partial recurrent neural network with a fully connected recurrent hidden layer is shown in Figure 7.

์™„์ „ ์—ฐ๊ฒฐ ์ˆœํ™˜ ํžˆ๋“  ๋ ˆ์ด์–ด๊ฐ€ ์žˆ๋Š” ๋ถ€๋ถ„ ์ˆœํ™˜ ์‹ ๊ฒฝ๋ง์€ Figure 7์— ๋‚˜์™€์žˆ์Šต๋‹ˆ๋‹ค.

Figure 8 shows a fully connected RNN.

Figure 8 ์€ ์™„์ „ ์—ฐ๊ฒฐ RNN ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

RNNs need to be trained differently to the feed-forward neural networks (FFNNs) described in Section 4.

RNN์€ Section 4์— ๋‚˜์™€์žˆ๋“ฏ์ด, ํ”ผ๋“œํฌ์›Œ๋“œ ์‹ ๊ฒฝ๋ง๊ณผ๋Š” ๋‹ค๋ฅด๊ฒŒ ํ›ˆ๋ จ๋˜์–ด์•ผํ•  ํ•„์š”๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

This is because, for RNNs, we need to propagate information through the recurrent connections in-between steps.

์™œ๋ƒํ•˜๋ฉด, RNN์€ ์ •๋ณด๋ฅผ ๋ฐ˜๋ณต ์Šคํƒญ ์ค‘์— ์ˆœํ™˜ ์—ฐ๊ฒฐ์„ ํ†ตํ•ด ์ „๋‹ฌํ•ด์•ผ๋˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

The most common and well-documented learning algorithms for training RNNs in temporal, supervised learning tasks are backpropagation through time (BPTT) and real-time recurrent learning (RTRL).

ํ˜„ ์‹œ์ ์˜ ์ง€๋„ ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์—์„œ RNN ์„ ํ•™์Šต์‹œํ‚ค๊ธฐ ์œ„ํ•œ ๊ฐ€์žฅ ์ผ๋ฐ˜์ ์ด๊ณ  ๋ฌธ์„œํ™”๊ฐ€ ์ž˜๋œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ backpropagation through time (BPTT) ๊ณผ real-time recurrent learning (RTRL) ์ž…๋‹ˆ๋‹ค.

In BPTT, the network is unfolded in time to construct an FFNN.

BPTT(์—ญ์ „ํŒŒ) ์—์„œ๋Š” ๋„คํŠธ์›Œํฌ๋ฅผ ์ œ๋•Œ ํŽผ์ณ ์ˆœ์ „ํŒŒ ์‹ ๊ฒฝ๋ง์„ ๊ตฌ์„ฑํ•ฉ๋‹ˆ๋‹ค.

Then, the generalised delta rule is applied to update the weights.

๊ทธ๋Ÿฌ๊ณ ๋‚˜๋ฉด, ๊ฐ€์ค‘์น˜๋ฅผ ๊ฐฑ์‹ ํ•˜๊ธฐ ์œ„ํ•ด ์ผ๋ฐ˜ํ™”๋œ ๋ธํƒ€ ๋ฃฐ(?) ์„ ์ ์šฉ์‹œํ‚ต๋‹ˆ๋‹ค.

This is an offline learning algorithm in the sense that we first collect the data and then build the model from the system.

์ด๊ฒƒ์€ ์˜คํ”„๋ผ์ธ ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ, ๋ฐ์ดํ„ฐ๋ฅผ ๋จผ์ € ๋ชจ์€ ๋‹ค์Œ ์‹œ์Šคํ…œ์—์„œ ๋ชจ๋ธ์„ ๋งŒ๋“ค๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

In RTRL, the gradient information is forward propagated.

RTRL(์‹ค์‹œ๊ฐ„ ์ˆœํ™˜ ํ•™์Šต) ์€ ๊ฒฝ์‚ฌ ์ •๋ณด๊ฐ€ ์ „๋ฐฉ์œผ๋กœ ์ „๋‹ฌ๋ฉ๋‹ˆ๋‹ค.

Here, the data is collected online from the Figure 8: This figure shows a fully recurrent neural network (RNN) with selffeedback connections.

์—ฌ๊ธฐ์„œ, ๋ฐ์ดํ„ฐ๋Š” Figure 8 ์—์„œ ์˜จ๋ผ์ธ์œผ๋กœ ๋ชจ์ด๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์ด ๊ทธ๋ฆผ์€ ์ž๊ธฐ ์‘๋‹ต ์—ฐ๊ฒฐ์ด ์žˆ๋Š” ์™„์ „ ์ˆœํ™˜ ์‹ ๊ฒฝ๋ง์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

system and the model is learned during collection.

์‹œ์Šคํ…œ๊ณผ ๋ชจ๋ธ์€ ์ˆ˜์ง‘ ์ค‘์— ํ•™์Šต๋ฉ๋‹ˆ๋‹ค.

Therefore, RTRL is an online learning algorithm.

๊ทธ๋Ÿฌ๋ฏ€๋กœ, RTRL ์€ ์˜จ๋ผ์ธ ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค.