RNN - leemik3/tensorflow-2.0 GitHub Wiki

RNN (Recurrent Neural Network)

  • ์‹œ๊ฐ„์ ์œผ๋กœ ์—ฐ์†์„ฑ์ด ์žˆ๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋ ค๊ณ  ๊ณ ์•ˆ๋œ ์ธ๊ณต ์‹ ๊ฒฝ๋ง

  • ์ด์ „ ์€๋‹‰์ธต์ด ํ˜„์žฌ ์€๋‹‰์ธต์˜ ์ž…๋ ฅ์ด ๋˜๋ฉด์„œ ๋ฐ˜๋ณต๋˜๋Š” ์ˆœํ™˜ ๊ตฌ์กฐ๋ฅผ ๊ฐ–๋Š”๋‹ค

  • ๊ธฐ์กด ๋„คํŠธ์›Œํฌ์™€ ๋‹ค๋ฅธ ์ ์€ '๊ธฐ์–ต์„ ๊ฐ–๋Š”๋‹ค'๋Š” ๊ฒƒ

  • ํ™œ์šฉ ๋ถ„์•ผ

    • ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ : ์Œ์„ฑ ์ธ์‹, ๋‹จ์–ด ์˜๋ฏธ ํŒ๋‹จ ๋ฐ ๋Œ€ํ™” ๋“ฑ ์ฒ˜๋ฆฌ
    • ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ
  • ์ด๋ก ์ ์œผ๋กœ๋Š” long sequence data๋„ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ์‹ค์ œ๋กœ๋Š” vanishing/exploding gradient problem์œผ๋กœ ์ธํ•ด ๋ช‡ ๊ฐœ์˜ step๋งŒ ๋ณผ ์ˆ˜ ์žˆ์Œ (long term dependency) -> ํ•ด๊ฒฐ์ฑ… : LSTM, GRU


RNN Lecture Note

Classical Approach for Time Series Analaysis

  1. Time domain analaysis
  2. Frequency domain analysis
  3. Nearest neighbors analysis
  4. Probabilistic Model : Language modeling (์ฃผ์–ด์ง„ ์‹œํ€€์Šค์— ๋Œ€ํ•ด์„œ ๊ทธ ๋‹ค์Œ์— ๋‚˜์˜ฌ ์‹œํ€€์Šค๊ฐ€ ๋ญ”์ง€์— ๋Œ€ํ•œ ํ™•๋ฅ  ๋ชจ๋ธ๋ง)
  5. (S)AR(I)MA(X) models : time series์˜ autocorrelation
  6. Decomposition : Time series = trend part + seasonal part + residuals -> ๋ถ„ํ•ดํ•˜๋Š” ๊ฒƒ
  7. Nonlinear Dynamics : (Ordinary / Partial / Stochastic) Differential Equation
  8. Machine Learning

Deep Learning Dealing with Sequential Data

MLP : stack of fully connected layers

  • W matrix๊ฐ€ ๊ณ ์ •๋˜์–ด ์žˆ์Œ โ†’ ์ž„์˜ ๊ธธ์ด์˜ sequence๋ฅผ ๋‹ค๋ฃจ๊ธฐ ํž˜๋“ค๋‹ค.
  • fixed length sequence ์˜ ๊ฒฝ์šฐ โ†’ ๋‹ค์–‘ํ•œ ํŒŒํ˜•์œผ๋กœ ์ธํ•ด ๋งŽ์€ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์š”ํ•จ.

CNN : stack of Conv, Pool, FC layers

  • ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ์„ฑ๋Šฅ์„ ์ž˜ ๋ƒ„

RNN : Recurrent Neural Network

Sequential Data๋Š” ์ˆœ์„œ๊ฐ€ ์˜๋ฏธ๊ฐ€ ์žˆ๋Š” ๋ฐ์ดํ„ฐ์ž„! ์ด order์„ ๊ณ ๋ คํ•˜๊ธฐ ์œ„ํ•ด์„œ ์ด์ „ time step์˜ output๊ณผ ์ƒˆ๋กœ์šด input์„ ๊ฐ™์ด ๊ณ ๋ คํ•˜์—ฌ ํ˜„์žฌ step์˜ output์„ ๋งŒ๋“œ๋Š” ๊ฒƒ์ด๋‹ค. image x : ๊ฐ time step์˜ ๋ฐ์ดํ„ฐ

p.s) ์ด์ „ time step์˜ output(h) + ์ƒˆ๋กœ์šด input(x) ๊ฐ™์ด ๊ณ ๋ คํ•˜๋Š” ๋ฐฉ๋ฒ•

h์— ํ–‰๋ ฌ ์—ฐ์‚ฐ(ํŒŒ๋ž€ ๋„ค๋ชจ ๋ถ€๋ถ„)์„ ํ•œ ๋’ค์— ํ˜„์žฌ input๊ณผ element-wiseํ•˜๊ฒŒ ๋”ํ•ด์ค€ ๋’ค nonlinear activation function ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰

Q) h์— ์—ฐ์‚ฐํ•˜๋Š” ํ–‰๋ ฌ์€ ๊ฐ™์€ ํŒŒ๋ผ๋ฏธํ„ฐ?
A) ใ…‡ใ…‡ parameter sharing์„ ํ•œ๋‹ค

Q) ๋“ค์–ด์˜ค๋Š” ๋ฐ์ดํ„ฐ์˜ ๊ธธ์ด๊ฐ€ ๋‹ค๋ฅด๋‹ค๋ฉด?
A) ์ž˜ ์•ˆ๋˜๊ฒ ์ฃ ? interpolation ๋“ฑ์˜ ๋ฐฉ๋ฒ•์œผ๋กœ ์ฒ˜๋ฆฌํ•œ๋‹ค.

Q) ๋ฐ์ดํ„ฐ๊ฐ€ ๋ฌธ์žฅ์ด๋ฉด ์–ด๋–ป๊ฒŒ ์ฒ˜๋ฆฌ? (x๊ฐ€ ๋ฌธ์žฅ์ธ ๊ฒฝ์šฐ์ž„)
A) NLP์—์„œ ๋‹ค๋ฃจ๋Š” ๋ฌธ์ œ์ธ๋ฐ, ์ฃผ๋กœ ํ•˜๋Š” ๋ฐฉ๋ฒ•์€, ๋ฌธ์žฅ์„ ๊ตฌ์„ฑํ•˜๋Š” ๋‹จ์–ด์˜ ์ง‘ํ•ฉ์„ ๋งŒ๋“ค๊ณ  ๊ฐ token์„ one-hot encoding

Output Y๋ฅผ ์–ป๋Š” ๋ฐฉ๋ฒ•

image

  • many to one : ์›ํ•˜๋Š” ๋ถ€๋ถ„์— ๊ทธ๋ƒฅ model ํ•˜๋‚˜๋ฅผ ๋” ๋ถ™์ธ๋‹ค.
  • many to many : ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ model(๋…ธ๋ž€ ๋„ค๋ชจ ๋ถ€๋ถ„)์„ ๋ถ™์ž„, parameter๊ฐ€ share๋˜๊ธฐ ๋•Œ๋ฌธ์— ๋…ธ๋ž€ model๋“ค์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ๋ชจ๋‘ ๊ฐ™์Œ

Q) ๊ทธ๋ฆผ์ด ํ•œ epoch์ธ๊ฐ€์š”?
A) ์•„๋‹ˆ์˜ค. ํ•ด๋‹น ๊ทธ๋ฆผ์€ ํ•˜๋‚˜์˜ ์‹œํ€€์Šค์— ๋Œ€ํ•œ ๊ทธ๋ฆผ. Batch๊ฐ€ 100๊ฐœ๋ผ๋ฉด ํ•ด๋‹น ๊ทธ๋ฆผ์ด 100๊ฐœ ์žˆ๊ณ  ๋ณ‘๋ ฌ์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•˜๋ฉด ๋จ.

RNN with Math

image