09 Training LSTM RNNs the Hybrid Learning Approach - PAI-yoonsung/lstm-paper GitHub Wiki

9 Training LSTM-RNNs - the Hybrid Learning Approach

In order to preserve the CEC in LSTM memory block cells, the original formulation of LSTM used a combination of two learning algorithms: BPTT to train network components located after cells, and RTRL to train network components located before and including cells.

LSTM λ©”λͺ¨λ¦¬ 블둝 μ…€ μ•ˆμ˜ CEC λ₯Ό λ³΄μ‘΄ν•˜κΈ° μœ„ν•΄, LSTM 의 원본 곡식은 두 κ°€μ§€ ν•™μŠ΅ μ•Œκ³ λ¦¬μ¦˜μ˜ 쑰합을 μ‚¬μš©ν–ˆμ—ˆλ‹€: μ…€λ“€ 뒀에 μœ„μΉ˜ν•œ λ„€νŠΈμ›Œν¬ ꡬ성 μš”μ†Œλ“€μ„ ν•™μŠ΅μ‹œν‚€κΈ° μœ„ν•œ BPTT 와, μ…€λ“€ μ•žμ— μœ„μΉ˜ν•œ λ„€νŠΈμ›Œν¬ ꡬ성 μš”μ†Œλ“€κ³Ό μ…€λ“€ 자체λ₯Ό ν•™μŠ΅μ‹œν‚€κΈ° μœ„ν•œ RTRL 이닀.

The latter units work with RTRL because there are some partial derivatives (related to the state of the cell) that need to be computed during every step, no matter if a target value is given or not at that step.

ν›„λ°© μœ λ‹›λ“€μ€ RTRL 둜 μž‘λ™ν•˜κ²Œ λ˜λŠ”λ°, κ·Έ μ΄μœ λŠ” (μ…€λ“€μ˜ μƒνƒœμ™€ κ΄€λ ¨λœ)νŽΈλ―ΈλΆ„μ΄ 있기 λ•Œλ¬Έμ΄λ‹€. 이듀은 ν˜„μž¬ μŠ€νƒ­μ—μ„œ λͺ©μ κ°’이 μ£Όμ–΄μ§€λ“  μ£Όμ–΄μ§€μ§€ μ•Šλ“  λ§€ μŠ€νƒ­λ§ˆλ‹€ κ³„μ‚°λ˜μ–΄μ•Όν•œλ‹€.

For now, we only allow the gradient of the cell to be propagated through time, truncating the rest of the gradients for the other recurrent connections.

ν˜„μž¬λŠ”, 였직 μ…€λ“€μ˜ κ²½μ‚¬λ§Œμ„ μ‹œκ°„μ— 따라 μ „νŒŒλ˜λ„λ‘ ν—ˆλ½ν•˜κ³ , λ‚˜λ¨Έμ§€ λ‹€λ₯Έ μˆœν™˜ 연결에 λŒ€ν•œ 경사듀은 μž˜λΌλ‚Έλ‹€.

We define discrete time steps in the form Ο„ = 1, 2, 3, .... Each step has a forward pass and a backward pass; in the forward pass the output/activation of all units are calculated, whereas in the backward pass, the calculation of the error signals for all weights is performed.

μš°λ¦¬λŠ” 이산 νƒ€μž„ μŠ€νƒ­λ“€μ„ Ο„ = 1, 2, 3, .... 의 ν˜•νƒœλ‘œ μ •μ˜ν•˜κ³ , 각 μŠ€νƒ­μ€ 순방ν–₯ 톡과와 μ—­λ°©ν–₯ 톡과λ₯Ό κ°–λŠ”λ‹€; 순방ν–₯ ν†΅κ³Όμ—μ„œλŠ” λͺ¨λ“  μœ λ‹›λ“€μ˜ 좜λ ₯/ν™œμ„±ν™” κ°€ κ³„μ‚°λ˜κ³ , λ°˜λ©΄μ— μ—­λ°©ν–₯ ν†΅κ³Όμ—μ„œλŠ” λͺ¨λ“  κ°€μ€‘μΉ˜λ“€μ— λŒ€ν•œ μ—λŸ¬ μ‹ ν˜Έμ˜ 계산이 이루어진닀.

9.1 The Forward Pass

Let M be the set of memory blocks. Let m_c be the c-th memory cell in the memory block m, and W[u,v] be a weight connecting unit u to unit v.

M 을 λ©”λͺ¨λ¦¬ λΈ”λ‘λ“€μ˜ 집합이라 ν•˜μž. μ΄λ•Œ, m_c λŠ” λ©”λͺ¨λ¦¬ 블둝 m μ•ˆμ˜ c 번째 λ©”λͺ¨λ¦¬ μ…€ 이고, W[u,v] λŠ” μœ λ‹› u 와 μœ λ‹› v λ₯Ό μ—°κ²°ν•˜λŠ” κ°€μ€‘μΉ˜κ°€ λœλ‹€.

In the original formulation of LSTM, each memory block m is associated with one input gate inm and one output gate out_m.

LSTM 의 원본 κ³΅μ‹μ—μ„œλŠ”, 각 λ©”λͺ¨λ¦¬ 블둝 m 은 μ—°κ΄€λœλ‹€.

The internal state of a memory cell m_c at time Ο„ + 1 is updated according to its state s_m_c (Ο„ ) and according to the weighted input z_m_c (Ο„ + 1) multiplied by the activation of the input gate y_in_m(Ο„ + 1).

νƒ€μž„ Ο„ + 1 μ—μ„œ λ©”λͺ¨λ¦¬ μ…€μ˜ 초기 μƒνƒœ m_c λŠ” μƒνƒœ s_m_c (Ο„ )와 μž…λ ₯ 게이트 y_in_m(Ο„ + 1) 의 ν™œμ„±ν™”μ— μ˜ν•΄ κ³±ν•΄μ§„ κ°€μ€‘μΉ˜κ°€ 적용된 μž…λ ₯ z_m_c (Ο„ + 1)λ₯Ό λ”°λ₯Έλ‹€.

Then, we use the activation of the output gate z_out_m(Ο„ + 1) to calculate the activation of the cell y_m_c (Ο„ + 1).

κ·Έλ¦¬κ³ λ‚˜λ©΄, 좜λ ₯ 게이트 z_out_m(Ο„ + 1) 의 ν™œμ„±ν™”λ₯Ό μ‚¬μš©ν•˜μ—¬ μ…€ y_m_c (Ο„ + 1) 의 ν™œμ„±ν™”λ₯Ό κ³„μ‚°ν•˜κ²Œ λœλ‹€.

The activation y_in_m of the input gate in_m is computed as

μž…λ ₯ κ²Œμ΄λ“œ in_m 의 ν™œμ„±ν™” y_in_m λŠ” μœ„μ˜ 24번 식과 같이 κ³„μ‚°λœλ‹€.

Figure 10: A standard LSTM memory block. The block contains (at least) one cell with a recurrent self-connection (CEC) and weight of β€˜1’. The state of the cell is denoted as sc. Read and write access is regulated by the input gate, yin, and the output gate, yout. The internal cell state is calculated by multiplying the result of the squashed input, g, by the result of the input gate, yin, and then adding the state of the last time step, sc(t βˆ’ 1). Finally, the cell output is calculated by multiplying the cell state, sc, by the activation of the output gate, y_out.

κ·Έλ¦Ό 10: 일반적인 LSTM λ©”λͺ¨λ¦¬ 블둝. 블둝은 μˆœν™˜-μžκΈ°μ—°κ²° (CEC) κ³Ό κ°€μ€‘μΉ˜ 1 을 κ°–λŠ” (μ΅œμ†Œ) ν•˜λ‚˜μ˜ μ…€μ„ν¬ν•¨ν•œλ‹€. μ…€μ˜ μƒνƒœλŠ” s_c 둜 ν‘œκΈ°ν•œλ‹€. Read, Write 접근은 μž…λ ₯ 게이트 y_in 와 좜λ ₯ 게이트 y_out 에 μ˜ν•΄ ν†΅μ œλœλ‹€. λ‚΄λΆ€ μ…€ μƒνƒœλŠ” μž…λ ₯게이트 y_in 의 결과둜 인해 μ§“λˆŒλ¦°(?) μž…λ ₯인 g λ₯Ό κ³±ν•΄μ£ΌλŠ” 것과 μ§€λ‚œ νƒ€μž„μŠ€νƒ­μ˜ μƒνƒœ s_c(t-1) 을 λ”ν•΄μ£ΌλŠ” κ²ƒμœΌλ‘œ κ³„μ‚°λœλ‹€. λ§ˆμ§€λ§‰μœΌλ‘œ, μ…€μ˜ 좜λ ₯은 좜λ ₯ 게이트 y_out 의 ν™œμ„±ν™”μ— μ˜ν•΄ μ…€ μƒνƒœ s_c λ₯Ό κ³±ν•΄μ£ΌλŠ” κ²ƒμœΌλ‘œ κ³„μ‚°λœλ‹€.

κ·Έλ¦Ό 11: 일반적인 LSTM λ©”λͺ¨λ¦¬ 블둝. 블둝은 μˆœν™˜-μžκΈ°μ—°κ²° (CEC) κ³Ό κ°€μ€‘μΉ˜ 1 을 κ°–λŠ” (μ΅œμ†Œ) ν•˜λ‚˜μ˜ μ…€μ„ν¬ν•¨ν•œλ‹€. μ…€μ˜ μƒνƒœλŠ” s_c 둜 ν‘œκΈ°ν•œλ‹€. Read, Write 접근은 μž…λ ₯ 게이트 y_in 와 좜λ ₯ 게이트 y_out 에 μ˜ν•΄ ν†΅μ œλœλ‹€. λ‚΄λΆ€ μ…€ μƒνƒœλŠ” μž…λ ₯게이트 y_in 의 결과둜 인해 μ§“λˆŒλ¦°(?) μž…λ ₯인 g(x) λ₯Ό κ³±ν•΄μ£ΌλŠ” 것과 ν˜„μž¬ νƒ€μž„μŠ€νƒ­μ˜ μƒνƒœ s_m_c(t) λ₯Ό λ‹€μŒ s_m_c(t+1) 에 λ”ν•΄μ£ΌλŠ” κ²ƒμœΌλ‘œ κ³„μ‚°λœλ‹€. λ§ˆμ§€λ§‰μœΌλ‘œ, μ…€μ˜ 좜λ ₯은 좜λ ₯ 게이트의 ν™œμ„±ν™”μ— μ˜ν•΄ μ…€ μƒνƒœλ₯Ό κ³±ν•΄μ£ΌλŠ” κ²ƒμœΌλ‘œ κ³„μ‚°λœλ‹€.

dictionary

partial derivatives: νŽΈλ―ΈλΆ„
truncating: 자λ₯΄κΈ°, μ ˆλ‹¨
regulated: 규제