Transformer - beyondnlp/nlp GitHub Wiki

Transformer 3๊ฐœ์˜ Multi-Head Attention์ด ์žˆ๋‹ค.

(1). Input Embedding์„ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›๋Š” MH Attention (2). Output Embedding์„ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›๋Š” MH Attention (3). ์œ„ ๋‘๊ฐ€์ง€๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›๋Š” MH Attention

Scaled Dot-Product Attention

  • Scale : Embedding๋œ ๊ฐ’ ์ค‘ ํŠน์ • ๊ฐ’์ด ๋„ˆ๋ฌด ํฐ ๊ฐ’์ด ์˜ค๋ฉด ํ•™์Šต์— ์ง€์žฅ์„ ์ฃผ๊ธฐ ๋•Œ๋ฌธ์— normalizing์„ ํ•ด์ค€๋‹ค.
  • Mask(opt) : (1)์€ ์ž…๋ ฅ์„ ๋ชจ๋‘ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— mask๋ฅผ ์‚ฌ์šฉํ•  ํ•„์š”๊ฐ€ ์—†๋Š”๋ฐ 2, 3๋ฒˆ์€ ์ˆœ์ฐจ์ ์œผ๋กœ ์ƒ์„ฑ์„ ํ•˜๋Š” ๋ฐฉ์‹์ด๋ผ์„œ ์œ ํšจํ•œ ๋ฒ”์œ„์— ๋Œ€ํ•œ ๋งˆ์Šคํ‚น์ด ํ•„์š”ํ•˜๋‹ค( ์ฆ‰ ์‹œ์ž‘์€ 1 0 0 0 ๋‘๋ฒˆ์งธ๋Š” 1 1 0 0, ์„ธ๋ฒˆ์งธ๋Š” 1 1 1 0 , ๋„ค๋ฒˆ์งธ๋Š” 1 1 1 1 ์œผ๋กœ ์œ ํšจ๋ฒ”์œ„๋ฅผ ๋งˆ์Šคํ‚นํ•œ๋‹ค )
  • Softmax : ๊ณ„์‚ฐ๋œ ๊ฐ’์„ ๊ฐ€์ค‘์น˜์— ๋”ฐ๋ผ ํ™•๋ฅ ๋กœ ํ‘œํ˜„( ํ™•๋ฅ  : ํ•ฉ์ด 1 )

Multi-Head Atteion์—์„œ

  • Linear์€ Fully Connected Netword๋ฅผ ์˜๋ฏธ
  • MH์€ 3๊ฐœ์˜ input์ด ์กด์žฌ ( Q, K, V )

Transformer Formular

  • root(Dk) = scaling Factor

(1). Query : Input, Key : Input Vec, Value : Input (2). Query : Output, Key : Output, Value : Output Hidden (3). Query : Output Vec, Key : Input Vec , Value : Input Hidden

Transformer Demension

https://jalammar.github.io/illustrated-transformer/