07 Solving the Vanishing Error Problem - PAI-yoonsung/lstm-paper GitHub Wiki

Standard RNN cannot bridge more than 5–10 time steps ([22]).

ν†΅μƒμ˜ RNN 은 5-10 번의 νƒ€μž„ μŠ€νƒ­λ³΄λ‹€ 더 많이 연결될 수 μ—†λ‹€.

This is due to that back-propagated error signals tend to either grow or shrink with every time step.

이것은 μ—­μ „νŒŒ μ—λŸ¬ μ‹ ν˜ΈλŠ” 맀 νƒ€μž„ μŠ€νƒ­λ§ˆλ‹€ μ»€μ§€κ±°λ‚˜ μž‘μ•„μ§€λŠ” κ²½ν–₯이 있기 λ•Œλ¬Έμ΄λ‹€.

Over many time steps the error therefore typically blows-up or vanishes ([5, 42]).

결과적으둜, λ§Žμ€ νƒ€μž„ μŠ€νƒ­μ„ κ±°λ“­ν•˜κ²Œ 되면, 일반적으둜 μ—λŸ¬κ°€ ν­λ°œν•˜κ±°λ‚˜ μ‚¬λΌμ§€κ²Œ λœλ‹€([5, 42]).

Blown-up error signals lead straight to oscillating weights, whereas with a vanishing error, learning takes an unacceptable amount of time, or does not work at all.

ν­λ°œν•œ μ—λŸ¬ μ‹ ν˜ΈλŠ” μ§„λ™ν•˜λŠ” κ°€μ€‘μΉ˜λ‘œ μ§κ²°λ˜λŠ” 반면, μ‚¬λΌμ§€λŠ” 였λ₯˜λŠ”, ν•™μŠ΅μ— κ³Όν•˜κ²Œ 였랜 μ‹œκ°„μ΄ 걸리게 λ˜κ±°λ‚˜, μž‘λ™ν•˜μ§€ μ•Šκ²Œ λœλ‹€.

ν†΅μƒμ˜ μ—­μ „νŒŒ μ•Œκ³ λ¦¬μ¦˜μ— μ˜ν•΄ μ–΄λ–»κ²Œ κΈ°μšΈκΈ°κ°€ κ³„μ‚°λ˜λŠ”μ§€μ— λŒ€ν•œ μ„€λͺ…κ³Ό 기본적인 λ² λ‹ˆμ‹± μ—λŸ¬ 뢄석은 λ‹€μŒκ³Ό κ°™λ‹€: μš°λ¦¬λŠ” λ‹€μŒ 곡식을 톡해 λ„€νŠΈμ›Œν¬κ°€ νƒ€μž„ t' μ—μ„œλΆ€ν„° νƒ€μž„ t κΉŒμ§€ ν›ˆλ ¨ν•œ μ΄ν›„μ˜ κ°€μ€‘μΉ˜λ₯Ό κ°±μ‹ ν•œλ‹€. (첫 번째, 두 번째 κ·Έλ¦Ό)
νƒ€μž„ Ο„ (with t` ≀ Ο„ < t) μ—μ„œ μœ λ‹› u의 μ—­μ „νŒŒλœ μ—λŸ¬ μ‹ ν˜ΈλŠ” 20번 곡식과 κ°™λ‹€.

Consequently, given a fully recurrent neural network with a set of non-input units U, the error signal that occurs at any chosen output-layer neuron o ∈ O, at time-step Ο„ , is propagated back through time for tβˆ’t' time-steps, with t' < t to an arbitrary neuron v.

결과적으둜, νƒ€μž„ μŠ€νƒ­μ΄ Ο„ μΌλ•Œ, 주어진 μ™„μ „ μˆœν™˜ 신경망과 non-μž…λ ₯ μœ λ‹›λ“€μ˜ 집합 U, νƒ€μž„μŠ€νƒ­μ΄ Ο„μΌλ•Œμ˜ μ–΄λ– ν•œ 좜λ ₯ λ ˆμ΄μ–΄ λ‰΄λŸ° o ∈ O μ—μ„œ μΌμ–΄λ‚˜λŠ” μ—λŸ¬ μ‹ ν˜ΈλŠ”, t` < t μΌλ•Œμ˜ μ€‘μž¬ λ‰΄λŸ° v λ₯Ό ν–₯ν•΄ t-t' νƒ€μž„μŠ€νƒ­ λ‚΄λ‚΄ λ’€λ‘œ μ „νŒŒλœλ‹€.

This causes the error to be scaled by the following factor:

μ΄λŠ” μ—λŸ¬κ°€ λ‹€μŒμ˜ μš”μ†Œλ“€μ— μ˜ν•΄ μŠ€μΌ€μΌλ§λœλ‹€λŠ” 것을 λœ»ν•œλ‹€.

To solve the above equation, we unroll it over time. For t' ≀ Ο„ ≀ t, let u_Ο„ be a non-input-layer neuron in one of the replicas in the unrolled network at time Ο„. Now, by setting ut = v and ut 0 = o, we obtain the equation

μœ„μ˜ 식을 ν•΄κ²°ν•˜κΈ° μœ„ν•΄μ„œλŠ”, μ‹œκ°„μ΄ 지남에 따라 이것을 νŽΌμ³μ•Ό ν•œλ‹€. t' ≀ Ο„ ≀ t 에 λŒ€ν•΄μ„œ, u_Ο„ λ₯Ό νƒ€μž„ Ο„ 에 λŒ€ν•œ νŽΌμ³μ§€λ‹ˆ μ•Šμ€ λ„€νŠΈμ›Œν¬ μ•ˆμ˜ λ ˆν”Œλ¦¬μΉ΄λ“€ 쀑 ν•˜λ‚˜ μ•ˆμ˜ non-μž…λ ₯ λ ˆμ΄μ–΄ λ‰΄λŸ°μœΌλ‘œ λ‘”λ‹€. 이제 u_t = v, u_t' = o 둜 μ„€μ •ν•˜λ©΄, 21번 곡식을 얻을 수 있게 λœλ‹€.

Observing Equation 21, it follows that if

21번 곡식을 μ‚΄νŽ΄λ³΄λ©΄, λ§Œμ•½ λ‹€μŒκ³Ό 같은 κ²½μš°λŠ” (22번)

for all Ο„ , then the product will grow exponentially, causing the error to blow-up; moreover, conflicting error signals arriving at neuron v can lead to oscillating weights and unstable learning. If now

λͺ¨λ“  Ο„ 에 λŒ€ν•΄μ„œ, 생산물이 κΈ°ν•˜κΈ‰μˆ˜μ μœΌλ‘œ μ»€μ§€κ²Œλ˜κ³ , μ—λŸ¬κ°€ ν­λ°œν•˜κ²Œ λœλ‹€; 더 λ‚˜μ•„κ°€, λ‰΄λŸ° v 에 λ„μ°©ν•˜λŠ” μ„œλ‘œ λ°˜λŒ€λ˜λŠ” μ—λŸ¬ μ‹ ν˜Έλ“€μ€ μ§„λ™ν•˜λŠ” κ°€μ€‘μΉ˜μ™€ λΆˆμ•ˆμ •ν•œ ν•™μŠ΅μ„ μ•ΌκΈ°ν•˜κ²Œ λœλ‹€.

λ§Œμ•½ λ‹€μŒκ³Ό 같은 κ²½μš°λŠ” (23번)

for all Ο„ , then the product decreases exponentially, causing the error to vanish, preventing the network from learning within an acceptable time period. Finally, the equation

λͺ¨λ“  Ο„ 에 λŒ€ν•΄μ„œ, 생산물이 κΈ°ν•˜κΈ‰μˆ˜μ μœΌλ‘œ μž‘μ•„μ§€κ²Œ λ˜μ–΄ μ—λŸ¬κ°€ μ‚¬λΌμ§€κ²Œ 되고, λ„€νŠΈμ›Œν¬κ°€ λ°›μ•„λ“€μΌλ§Œν•œ μ‹œκ°„ λ™μ•ˆμ˜ ν•™μŠ΅μ„ λ°©ν•΄ν•œλ‹€.

λ§ˆμ§€λ§‰μœΌλ‘œ λ‹€μŒ 곡식은

shows that if the local error vanishes, then the global error also vanishes. A more detailed theoretical analysis of the problem with long-term dependencies is presented in [39]. The paper also briefly outlines several proposals on how to address this problem.

λ§Œμ•½ 둜컬 μ—λŸ¬κ°€ μ‚¬λΌμ§€κ²Œ 되면, κΈ€λ‘œλ²Œ μ—λŸ¬ λ˜ν•œ μ‚¬λΌμ§€κ²Œ λœλ‹€λŠ” 것을 보여쀀닀. 이 μž₯κΈ°κ°„ μ˜μ‘΄μ„± λ¬Έμ œμ— λŒ€ν•œ 더 μžμ„Έν•œ 이둠적 뢄석은 [39] 에 λ‚˜μ™€μžˆλ‹€. 이 λ…Όλ¬Έμ—μ„œλ„ μ–΄λ–»κ²Œ 이 문제λ₯Ό λ‹€λ£° 것인가에 λŒ€ν•œ λͺ‡ 가지 μ œμ•ˆμ„ κ°„λž΅ν•˜κ²Œ μ„œμˆ ν•  것이닀.

dictionary

oscillating: μ§„λ™ν•˜λŠ” replicas: λ³΅μ œν’ˆλ“€ exponentially: κΈ°ν•˜κΈ‰μˆ˜μ μœΌλ‘œ outlines: μ„œμˆ ν•˜λ‹€ proposals: μ œμ•ˆ