RNN - BD-SEARCH/MLtutorial GitHub Wiki

๊ธฐ์กด์˜ ์ธ๊ณต์‹ ๊ฒฝ๋ง์€ ์ˆœ์ฐจ์ ์ธ ์ •๋ณด๊ฐ€ ๋‹ด๊ธด ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” ๊ฒƒ์— ์–ด๋ ค์›€์ด ์žˆ๋‹ค.

RNN(Recurrent Neural Network)

01. RNN์ด๋ž€?

  • Sequence data๋ฅผ ๋ชจ๋ธ๋งํ•˜๊ธฐ ์œ„ํ•ด ๋“ฑ์žฅ
  • ๋‹ค๋ฅธ NN๊ณผ๋Š” ๋‹ค๋ฅด๊ฒŒ hidden state๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์Œ (๊ธฐ์–ต. ์ง€๊ธˆ๊นŒ์ง€ ์ž…๋ ฅ๋œ ๋ฐ์ดํ„ฐ์˜ ์š”์•ฝ๋œ ์ง‘ํ•ฉ)
    • ์ƒˆ๋กœ ์ž…๋ ฅ์ด ๋“ค์–ด์˜ค๋ฉด ๋„คํŠธ์›Œํฌ๋Š” ์ž์‹ ์˜ hidden state๋ฅผ ์ˆ˜์ •
    • ์ž…๋ ฅ์„ ๋ชจ๋‘ ์ฒ˜๋ฆฌํ•˜๊ณ  ๋‚œ ๋„คํŠธ์›Œํฌ์˜ hidden state๋Š” sequence ์ „์ฒด๋ฅผ ์š”์•ฝํ•˜๋Š” ์ •๋ณด๊ฐ€ ๋จ

image

  • ๋นจ๊ฐ• : ์ž…๋ ฅ / ๋…ธ๋ž‘ : hidden state(๊ธฐ์–ต) / ํŒŒ๋ž‘ : ์ถœ๋ ฅ
    • ์ฒซ๋ฒˆ์งธ ์ž…๋ ฅ์ด ๋“ค์–ด์˜ค๋ฉด ์ฒซ๋ฒˆ์งธ hidden state๊ฐ€ ๋งŒ๋“ค์–ด์งŠ
    • ๋‘๋ฒˆ์งธ ์ž…๋ ฅ์ด ๋“ค์–ด์˜ค๋ฉด ์ฒซ๋ฒˆ์งธ hidden state๋ฅผ ์ฐธ๊ณ ํ•˜์—ฌ ๋‘๋ฒˆ์งธ hidden state๊ฐ€ ๋งŒ๋“ค์–ด์งŠ
  • ์ˆœ์ฐจ์ ์ธ ์ •๋ณด๊ฐ€ ๋‹ด๊ธด ๋ฐ์ดํ„ฐ(Sequence data)๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” ๋ฐ ์šฉ์ด
  • ์ด์ „ ์ƒํƒœ์˜ hidden layers์˜ ๊ฒฐ๊ณผ๊ฐ€ ๋‹ค์Œ ์ˆœ์„œ์˜ hidden layers์˜ ์ž…๋ ฅ์œผ๋กœ ๋“ค์–ด๊ฐ€๋„๋ก ์„ค๊ณ„
  • ์ด์ „ ์ƒํƒœ๋ฅผ ๋ณด์กดํ•˜๋ฉด์„œ ์ƒˆ๋กœ์šด ์ƒํƒœ๋ฅผ ๋ฐ›์•„๋“ค์ด๊ธฐ ๋•Œ๋ฌธ์— ์ž…๋ ฅ๊ฐ’์˜ ์‹œ๊ฐ„ ์ˆœ์„œ๋ฅผ ๊ธฐ์–ต
    • ์ˆœ์ฐจ์ ์ธ ๋ฐ์ดํ„ฐ ์ž…๋ ฅ์— ๋”ฐ๋ผ ๋ฌธ๋งฅ ์ •๋ณด ํŒŒ์•…์ด ๊ฐ€๋Šฅ
    • ์ด๋ก ์ ์œผ๋ก  ๊ธด Sequence Data๋ฅผ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ์‹ค์ œ๋กœ๋Š” ๋น„๊ต์  ์งง์€ Sequence Data๋งŒ ํšจ๊ณผ์ ์œผ๋กœ ์ฒ˜๋ฆฌํ•œ๋‹ค : ์žฅ๊ธฐ ์˜์กด์„ฑ ๋ฌธ์ œ
  • RNN์€ ์ž…๋ ฅ ๊ฐ’ ํฌ๊ธฐ์˜ ์ œํ•œ์ด ์กด์žฌํ•˜์ง€ ์•Š๋Š”๋‹ค. ๊ทธ๋ž˜์„œ ๊ธธ์ด์˜ ์ œํ•œ์ด ์—†๋Š” ํ…์ŠคํŠธ๋‚˜ ์Œ์„ฑ ๋ถ„์„์— ๋งŽ์ด ์‚ฌ์šฉ๋œ๋‹ค.

: Unfold ํ˜•ํƒœ๋กœ ๋ฐ”๋ผ๋ณธ RNN

02. Backpropagation Through Time (BPTT)

  • RNN์€ Backprpagation๊ณผ๋Š” ์กฐ๊ธˆ ๋‹ค๋ฅด๊ฒŒ ํ•™์Šต์‹œ์ผœ์•ผ ํ•œ๋‹ค.
  • RNN์—์„  ํ˜„์žฌ step๊ณผ ์ด์ „ step์ด ์—ฐ๊ฒฐ๋˜์–ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์‹œ๊ฐ„์„ ๊ฑฐ์Šฌ๋Ÿฌ ์˜ฌ๋ผ๊ฐ€๋ฉฐ backpropagation ์ง„ํ–‰
  • ์‚ฌ์šฉ๋˜๋Š” ํ•จ์ˆ˜
    • Softmax
      • ๊ฐ’์˜ ๋ถ„ํฌ์— ๋”ฐ๋ฅธ ์ถœํ˜„ ํ™•๋ฅ 
    • Cross-Entropy
      • ๋‘ ๊ฐœ์˜ ํ™•๋ฅ  ๋ถ„ํฌ์˜ ๋น„์Šทํ•œ ์ •๋„๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ์ง€ํ‘œ
      • 0์— ๊ฐ€๊นŒ์šธ ์ˆ˜๋ก ํ™•๋ฅ  ๋ถ„ํฌ๊ฐ€ ๋น„์Šทํ•˜๋‹ค
      • softmax๋ฅผ ํ†ตํ•ด ๋‚˜์˜ค๋Š” ์˜ˆ์ธก๊ฐ’์ด ํ™•๋ฅ  ๋ถ„ํฌ์ด๋ฏ€๋กœ, ์˜ค์ฐจํ•จ์ˆ˜๋Š” Cross-Entropy๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค
  • ์ˆ˜ํ•™์  ๊ด€์ ์œผ๋กœ ๋ณด๋Š” BPTT

Vanishing gradient

  • backpropagation through time์—์„œ, step์˜ ๊ธธ์ด๊ฐ€ ๊ธธ์–ด์งˆ์ˆ˜๋ก ํŽธ๋„ํ•จ์ˆ˜ ๊ฐ’์ด 0์— ์ˆ˜๋ ดํ•˜๋Š” ๋ฌธ์ œ์ด๋‹ค.
  • Truncated BPTT (๋‹จ๊ธฐ BPTT)
    • ๋ชจ๋“  ์‹œ๊ฐ„์— ๋Œ€ํ•œ ์€๋‹‰์ธต์˜ ๊ฐ’์„ ์ €์žฅํ•˜๋Š” ๊ฒƒ : ํ˜„์‹ค์ ์œผ๋กœ ๋ถˆ๊ฐ€๋Šฅ
    • ๊ธฐ์ค€ ๊ธธ์ด๋ณด๋‹ค ์˜ค๋ž˜๋œ ๊ฐ’์€ ๋ฐ˜์˜ํ•˜์ง€ ์•Š๋„๋ก ํ•œ๋‹ค
  • ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•

03. RNN์˜ ์˜ˆ์‹œ

  • one to one

    • ๊ณ ์ • input, ๊ณ ์ • output
    • perceptron. ์‹ ๊ฒฝ๋ง์˜ ๊ธฐ๋ณธ์ ์ธ ๊ตฌ์กฐ
    • ์ˆœํ™˜์ ์ธ ๋ถ€๋ถ„์ด ์—†๊ธฐ ๋•Œ๋ฌธ์— rnn์ด ์•„๋‹ˆ๋‹ค.
  • one to many

    • ๊ณ ์ • input, ์‹œํ€€์Šค output
    • image captioning : ์ด๋ฏธ์ง€๋ฅผ ๊ฐ€์ง€๊ณ  Sequence of words ์ƒ์‚ฐ
    • ์ด๋ฏธ์ง€ -> ๋“คํŒ ์œ„์— ๊ตฌ๋ฆ„์ด ๋‚€ ํ•˜๋Š˜์ด ์žˆ๋‹ค.
  • many to one

    • ์‹œํ€€์Šค input, ๊ณ ์ • output
    • Sentiment analysis
    • sequence of words -> sentiment
    • ์ฃผ์–ด์ง„ ๋ฌธ์žฅ์˜ ๊ฐ์ • ์ƒํƒœ๋ฅผ ๋ถ„์„ํ•œ๋‹ค (afinn, vader)
  • many to many

    • ์‹œํ€€์Šค input, ์‹œํ€€์Šค output
    • Machine Translation
    • sequence of words -> sequence of words
    • ๋ฒˆ์—ญ๊ธฐ : encoder to decoder๋ผ๊ณ ๋„ ํ•œ๋‹ค
  • many to many

    • ๋™๊ธฐํ™”๋œ ์‹œํ€€์Šค input, ์‹œํ€€์Šค output
    • Search term autocomplete
    • sequence of words -> sequence o words
    • ์ž๋™ ์™„์„ฑ ๊ฒ€์ƒ‰์–ด

RNN์€ ์ž…๋ ฅ๋œ์ง€ ์‹œ๊ฐ„์ด ์˜ค๋ž˜ ์ง€๋‚œ ๋ฐ์ดํ„ฐ๋Š” ์ž˜ ๋ณด์กดํ•˜์ง€ ๋ชปํ•˜๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค. ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด LSTM์ด ๊ฐœ๋ฐœ๋˜์—ˆ๋‹ค.

Reference