Week10 Day3 - ai-esg/our-history GitHub Wiki

ํŒ€ NLP 11์กฐ Week10 Day3

๋ชฉ์ฐจ

์ผ์ž

  • 2021.10.07 ๋ชฉ

ํŒ€์›

  • ๋ฌธ์„์•”_T2075
  • ๋ฐ•๋งˆ๋ฃจ์ฐฌ_T2078
  • ๋ฐ•์•„๋ฉ˜_T2090
  • ์šฐ์›์ง„_T2137
  • ์œค์˜ํ›ˆ_T2142
  • ์žฅ๋™๊ฑด_T2185
  • ํ™ํ˜„์Šน_T2250

ํ”ผ์–ด์„ธ์…˜

์ตœ์ข… ๋ชจ๋ธ ์„ ์ •

1๋“ฑ (75.796)

  • 5๊ฐœ ๋ชจ๋ธ
    • 72.710 (TAPT+ added data + entity marker, tokenization modify)
    • 73.950 (TAPT + entity marker, tokenization modify)
    • 74.034 (added data + entity marker + tokenization modify k-fold 5)
    • 72.991 (entity marker + tokenization modify)
    • 70.724 (TAPT epoch 30 + added data + entity marker, tokenization modify)

2๋“ฑ (75.607)

  • 4๊ฐœ ๋ชจ๋ธ
    • 72.710 (TAPT+ added data + entity marker, tokenization modify)
    • 73.950 (TAPT + entity marker, tokenization modify)
    • 74.034 (added data + entity embed + tokenization modify k-fold 5)
    • 72.991 (entity embed + tokenization modify)

3๋“ฑ (75.590)

  • 4๊ฐœ ๋ชจ๋ธ
    • 72.710 (TAPT+ added data + entity marker, tokenization modify)
    • 73.950 (TAPT+ entity marker, tokenization modify)
    • 73.124 (added data + entity embed + tokenization modify)
    • 72.991 (entity embed + tokenization modify)

์ƒˆ๋กœ์šด ์•„์ด๋””์–ด (2ํšŒ ๋‚จ์Œ)

์‹œ๋„ ์™„๋ฃŒ

  • 1,2,3 ๋“ฑ ๋ฐ์ดํ„ฐ ํ•ฉ์น˜๊ธฐ
    • ๊ฒฐ๊ณผ 1๋“ฑ๊ณผ ๋™์ผ
  • k-fold์— ๊ฐ€์ค‘์น˜ (1๋“ฑ ๋ชจ๋ธ์—์„œ)
    • 1.2 ์ •๋„ k-fold
    • ๊ฒฐ๊ณผ 75.962, 82.359
  • ์„ฑ๋Šฅ ๋งŒํผ ๊ฐ€์ค‘์น˜ (1๋“ฑ ๋ชจ๋ธ ๊ธฐ์ค€์—์„œ)
    • ๋Šฅ๋ ฅ๋งŒํผ ์ค€๋‹ค
    • ๊ฐ ํ™•๋ฅ ์˜ ์ œ๊ณฑํ•ฉ? ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ํž˜๋“ค์–ด ๋ณด์ธ๋‹ค.

  • ๋ฉ˜ํ† ๋‹˜ ์•„์ด๋””์–ด
    • ํ•˜๋“œ ๋ณดํŒ… (๋ฝ€๋ก) : ํ™•๋ฅ ๊ฐ’ ์ œ์ถœ์„ ๊ฐ™์ด ํ•ด์•ผ ํ•ด์„œ ์–ด๋ ค์šธ๋“ฏ ํ•˜๋‹ค. (์ž„์˜๋กœ ๊ฐ์ž 1/nํ™•๋ฅ ์ด๋ผ ํ•  ์ˆ˜๋„ ์žˆ๊ฒ ์ง€๋งŒ)
    • eval๋น„๊ต๋ฅผ ํ†ตํ•œ ์„ ํƒ
    • ์ „๋ถ€ ์†Œํ”„ํŠธ ๋ณดํŒ…
      • ๋‹จ์ผ๋ชจ๋ธ

ํ”„๋กœํ•„ ์•„์ด์ฝ˜ ์–ด๋–ป๊ฒŒ ํ†ต์ผํ•˜์ง€!

  • ์ ˆ์ทจ์„  ๊ฐ™์€๊ฑฐ?

๋ฉ˜ํ† ๋ง

  • ๋งˆ์ง€๋ง‰ ๋‚  ๋ฌด์—‡์„ ํ• ์ง€ ๊ณ„ํš์„ ์„ธ์šฐ๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•˜๋‹ค
  • ์•™์ƒ๋ธ” ์ฝ”๋“œ ์ž˜ ๋‚จ๊ฒจ๋‘๋ฉด ์•ž์œผ๋กœ๋„ ๋งŽ์ด ์”€!
  • ๋งˆ์ง€๋ง‰์— ํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹ ํˆด์„ ์‚ฌ์šฉํ•˜๋Š”๊ฒƒ๋„ ์ข‹๋‹ค!
  • ์ ์ˆ˜ ๋•Œ๋ฌธ์— ๊ฐœ๊ฐœ์ธ์˜ ๊ณต๋ถ€๋ฅผ ํฌ๊ธฐํ•˜์ง€ ๋ง ๊ฒƒ!
    • ์ ์ˆ˜์— ๊ธฐ์—ฌํ•˜๋Š”๊ฐ€?
    • ์„ฑ์žฅ์— ๊ธฐ์—ฌํ•˜๋Š”๊ฐ€?
  • MRC์—์„œ๋Š” ์ฒ˜์Œ๋ถ€ํ„ฐ ์ฝ”๋“œ๋ฅผ ์ง  ํŒ€์ด ์„ฑ์ ์ด ์ข‹์•˜๋‹ค๊ณ  ํ•˜๋„ค์š”
    • baseline์€ ์ดํ•ด์˜ ์šฉ๋„๋กœ๋งŒ ์‚ฌ์šฉํ•˜๊ธฐ
  • ํ˜ธ์นญ ๋ฉ˜ํ† ๋‹˜ ์‚ฌ์šฉํ•˜์ง€ ๋ง ๊ฒƒ!

์ผ์ • ์กฐ์œจ

  • 22์ผ ๊ธˆ์š”์ผ๋กœ ๊ทธ ์ฃผ ์ผ์ • ์กฐ์œจ ๊ฐ€๋Šฅํ•œ์ง€

์ˆ™์ œ

  1. relu + max pooling ์ˆœ์„œ์— ๊ฐ™์€์ ๊ณผ ์ด์  ๋ณด์ด๊ธฐ
  2. 2D-Conv ์œผ๋กœ 2D Avg pooling ๋ณด์ด๊ธฐ
  3. sin/cos ์€ ์™œ ๊ฑฐ๋ฆฌ๊ฐ€ ๊ฐ™์€์ง€ ๋ณด์—ฌ์ฃผ๊ธฐ

๊ณต์ง€

  • ๋žฉ์—… ๋ฆฌํฌํŠธ
    • ๋„ˆ๋ฌด ๋ถ€ํ’€๋ฆฌ๊ธฐ ๊ธˆ์ง€(๋„ˆ๋ฌด ๋””ํ…Œ์ผํ•œ ๋ถ€๋ถ„)
    • ํ”ผ๋“œ๋ฐฑ์„ ๋ฐ›๊ณ ์ž ํ•˜๋Š” ๊ฒƒ์„ ์—ผ๋‘ํ•˜๊ณ  ์ž‘์„ฑํ•˜๋Š”๊ฒƒ์ด ๋” ์ข‹์„๋“ฏ
    • ๋ฉ˜ํ† ๋งŒ ๋ณด๊ฒŒ๋œ๋‹ค
    • ์‹œ๊ฐ„ ๋„ˆ๋ฌด ๋‚ญ๋น„ํ•˜์ง€ ๋ง ๊ฒƒ
  • ์ฝ”๋“œ ์ œ์ถœ
    • ์ฝ”๋“œ ํ”ผ๋“œ๋ฐฑ์„ ๋ฐ›๊ณ  ์‹ถ์€ ๋ถ€๋ถ„์€ ์•ž๋ถ€๋ถ„์— ์ž‘์„ฑ ์š”๋ง

๋žฉ์—… ๋ฆฌํฌํŠธ

๋ชฉ์ฐจ

  • [1. ํ”„๋กœ์ ํŠธ ๊ฐœ์š” (์—ฌ๊ธฐ๋ฅผ ์—ด์‹ฌํžˆ ์ค„์ด๊ธฐ)]
  • [2. ํ”„๋กœ์ ํŠธ ํŒ€ ๊ตฌ์„ฑ ๋ฐ ์—ญํ• ]
  • [3. ํ”„๋กœ์ ํŠธ ์ˆ˜ํ–‰์ ˆ์ฐจ ๋ฐ ๋ฐฉ๋ฒ•]
  • [4. ํ”„๋กœ์ ํŠธ ์ˆ˜ํ–‰ ๊ฒฐ๊ณผ]
  • [5. ์ž์ฒด ํ‰๊ฐ€ ์˜๊ฒฌ]

1. ํ”„๋กœ์ ํŠธ ๊ฐœ์š” (์—ฌ๊ธฐ๋ฅผ ์—ด์‹ฌํžˆ ์ค„์ด๊ธฐ)

  • ์–ด๋–ค ํ”„๋กœ์ ํŠธ์ด๋ฉฐ ์–ด๋–ค task๋ฅผ ์ˆ˜ํ–‰ํ–ˆ๋Š”๊ฐ€?
  • ํ•œ์ค„์ •๋„ ์ตœ์ข… ๊ฒฐ๊ณผ

2. ํ”„๋กœ์ ํŠธ ํŒ€ ๊ตฌ์„ฑ ๋ฐ ์—ญํ• 

  • ์ฐธ์—ฌํ•œ ๋ถ€๋ถ„์— ๋Œ€ํ•œ ๋ช…์‹œ

3. ํ”„๋กœ์ ํŠธ ์ˆ˜ํ–‰์ ˆ์ฐจ ๋ฐ ๋ฐฉ๋ฒ•

  • ๋ถ„์„
    • ์ ‘๊ทผ ๋ฐฉ๋ฒ•, ๋ฐœ์ƒ
      • ๋…ผ๋ฌธ์—์„œ ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ๋‚ธ ๋ชจ๋ธ์„ ์„ ์ •.
      • ๊ธฐ์กด์˜ Pretraining๊ณผ ์ตœ๋Œ€ํ•œ ๋น„์Šทํ•œ ์ž…๋ ฅ์„ ์ฃผ๊ธฐ ์œ„ํ•œ ์ž…๋ ฅ ํ˜•ํƒœ ๋ณ€ํ˜•
      • ๋ฐ์ดํ„ฐ ์ƒ์„ฑ์„ ์œ„ํ•ด ๋™์ผ type์˜ ๋‹ค๋ฅธ ๋‹จ์–ด ๋„ฃ๊ธฐ
      • ๋ชฉํ‘œ subject/object์— type์„ ๋‚˜ํƒ€๋‚ด๋Š” ํ† ํฐ์„ ๋„ฃ๊ธฐ
      • ๋˜ ๋ญ๊ฐ€ ์žˆ์„๊นŒ์š”?
  • ๋ชจ๋ธ ์„ ์ •
  • ํ‰๊ฐ€ ๊ฐœ์„ 
  • ์ ์šฉ ์‚ฌํ•ญ
    • ๊ฐ์ž ์‹œ๋„ํ•œ ๋ฐฉ๋ฒ•๋“ค ์„ค๋ช…

4. ํ”„๋กœ์ ํŠธ ์ˆ˜ํ–‰ ๊ฒฐ๊ณผ

  • ์„ฑ๋Šฅ ํ–ฅ์ƒ์— ๋„์›€์ด ๋˜์—ˆ๋˜ ์ ์šฉ
  • ์ตœ์ข… ๋ชจ๋ธ
  • ์ตœ์ข… ์ ์ˆ˜

5. ์ž์ฒด ํ‰๊ฐ€ ์˜๊ฒฌ

  • ์ž˜ํ•œ ์ 
    • ๊นƒํ—ˆ๋ธŒ ์‚ฌ์šฉ์ด ๋งŒ์กฑ์Šค๋Ÿฌ์› ๋‹ค. -> ํ”„๋กœ์ ํŠธ ๊ด€๋ฆฌ๊ฐ€ ๋˜๋Š” ๋А๋‚Œ์ด์˜€๋‹ค.
    • task๊ด€๋ จ paper๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์‹คํ—˜์„ ๋ถ„ํ•  ์ง„ํ–‰ํ•œ ๊ฒƒ.
  • ์‹œ๋„ ํ–ˆ์œผ๋‚˜ ์ž˜ ๋˜์ง€ ์•Š์•˜๋˜ ๊ฒƒ๋“ค
    • ์•„์‰ฌ์šด ์ ๊ณผ ์ค‘๋ณต?
    • oversampling
  • ์•„์‰ฌ์› ๋˜ ์ ๋“ค
    • ์‹คํ—˜ ๊ด€๋ฆฌ ์ข€ ๋ถˆํŽธํ–ˆ์Œ (์ด๋ฆ„ ์ง“๊ธฐ, wandb์—์„œ ๋‚ด ๋ชจ๋ธ ์ฐพ๊ธฐ)
      • arg ๊ฐ€ ๋„ˆ๋ฌด ๋งŽ์•„์„œ ์˜คํžˆ๋ ค ํž˜๋“ค์—ˆ์Œ..
      • ๋‹ค์Œ์—๋Š” ๋งˆ๋ฃจ์ฐฌ 1, 2, 3, 4, 5 ์•„๋ฉ˜ 1,2,3,4 ๋“ฑ์œผ๋กœ ์ž‘์„ฑํ•ด๋ณด์ž.
    • ๋ชจ๋ธ ๋ถ„ํ•  ๋ชปํ•ด๋ณธ ๊ฒƒ
    • embedding layer์ถ”๊ฐ€ ๋ชป ํ•ด๋ณธ ๊ฒƒ
    • tokenizing๋ถ€ํ„ฐ ์ปค์Šคํ…€ ๋ชจ๋ธ ์ž‘์„ฑํ•˜๊ธฐ
    • BERT๋ชจ๋ธ ์œ„์— ์ธต์„ ๋” ์Œ“์•„์„œ ์‹คํ—˜ํ•ด๋ณด์ง€ ๋ชปํ•œ๊ฒƒ
    • wandb๋ฅผ ์ข€ ๋” ์ž˜ ์‚ฌ์šฉํ• ์ˆ˜์žˆ์ง€ ์•Š์•˜์„๊นŒ