Machine Reading Comprehension - newlife-js/Wiki GitHub Wiki

by KAIST ์„œ๋ฏผ์ค€ ๊ต์ˆ˜๋‹˜

Machine Reading Comprehension(MRC)

์ฃผ์–ด์ง„ ์ง€๋ฌธ์„ ์ดํ•ดํ•˜๊ณ , ์ฃผ์–ด์ง„ ์งˆ์˜์˜ ๋‹ต๋ณ€์„ ์ถ”๋ก ํ•˜๋Š” ๋ฌธ์ œ

MRC Datasets ์ข…๋ฅ˜

  • Extractive Answer Datasets: ์งˆ์˜์— ๋Œ€ํ•œ ๋‹ต์ด ํ•ญ์ƒ ์ฃผ์–ด์ง„ ์ง€๋ฌธ์˜ segment(span)๋กœ ์กด์žฌ
    (์ง€๋ฌธ์— ํ•ด๋‹น ๋‹จ์–ด๊ฐ€ ๊ทธ๋Œ€๋กœ ์กด์žฌํ•˜๋Š” ๊ฒฝ์šฐ)
  • Descriptive/Narrative Answer Datasets: ๋‹ต์ด ์ง€๋ฌธ ๋‚ด์—์„œ ์ถ”์ถœํ•œ span์ด ์•„๋‹Œ, ์งˆ์˜๋ฅผ ๋ณด๊ณ  ์ƒ์„ฑ๋œ sentence(free-form)์˜ ํ˜•ํƒœ
  • Multiple-choice Datasets: ์งˆ์˜์— ๋Œ€ํ•œ ๋‹ต์„ ์—ฌ๋Ÿฌ ๊ฐœ์˜ answer candidates ์ค‘ ํ•˜๋‚˜๋กœ ๊ณ ๋ฅด๋Š” ํ˜•ํƒœ

Challenges

  • ๋‹จ์–ด๋“ค์˜ ๊ตฌ์„ฑ์ด ์œ ์‚ฌํ•˜์ง€๋Š” ์•Š์ง€๋งŒ ๋™์ผํ•œ ์˜๋ฏธ๋ฅผ ๊ฐ€์ง€๋Š” ๊ฒฝ์šฐ
  • ๋‹ต์ด ์ง€๋ฌธ์— ์—†๋Š” ๊ฒฝ์šฐ
  • Multi-hop reasoning(์—ฌ๋Ÿฌ ๊ฐœ์˜ documnet์—์„œ supporting fact๋ฅผ ์ฐพ์•„์•ผ์ง€๋งŒ ๋‹ต์„ ์ฐพ์„ ์ˆ˜ ์žˆ๋Š” ๊ฒฝ์šฐ)

โ€ป KorQuAD: ์งˆ์˜์‘๋‹ต/๊ธฐ๊ณ„๋…ํ•ด ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ์…‹

Extraction-based MRC

Metric

  • Exact Match Score: GT์™€ character๊ฐ€ ๊ฐ™์œผ๋ฉด 1, ์•„๋‹ˆ๋ฉด 0
  • F1 score: GT์™€์˜ token overlap์„ F1์œผ๋กœ ๊ณ„์‚ฐ
    image

Preprocessing

  • Context์™€ Question์„ concatenation
  • Tokenization + Special Token(CLS, SEP, UNK, PAD ๋“ฑ)
  • Attention mask: ์ž…๋ ฅ sequence์—์„œ ํ•„์š”ํ•œ ์ •๋ณด๊ฐ€ ์žˆ๋Š” ๋ถ€๋ถ„์„ 1๋กœ, ์•„๋‹Œ ๋ถ€๋ถ„์„ 0์œผ๋กœ(PAD, ์งˆ๋ฌธ ๋“ฑ)
  • ์ •๋‹ต token์˜ ์œ„์น˜ ํ‘œํ˜„
  • ์ถœ๋ ฅ ๋ ˆ์ด์–ด ๊ตฌ์„ฑ
    image

Pre-training & Fine-tuning

BERT๋กœ Contextualized Embedding์„ ๊ตฌ์„ฑ(pre-training)
๊ฐ ํ† ํฐ์ด ๋‹ต์˜ ์‹œ์ž‘ ํ† ํฐ์ผ ํ™•๋ฅ  / ๋ ํ† ํฐ์ผ ํ™•๋ฅ ์„ ์ถœ๋ ฅํ•˜๋„๋ก classification(fine-tuning)

Post-processing

  • ๋ถˆ๊ฐ€๋Šฅํ•œ ๋‹ต ์ œ๊ฑฐํ•˜๊ธฐ(end๊ฐ€ start๋ณด๋‹ค ์•ž์— ์ž‡๋Š” ๊ฒฝ์šฐ, position์ด context์— ์†ํ•˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ ๋“ฑ)
  • score(logit)์ด ๊ฐ€์žฅ ํฐ ์˜ˆ์ธก์„ ์ถœ๋ ฅ

Passage Retrieval

์งˆ๋ฌธ์— ๋Œ€ํ•œ ๋‹ต์„ ํฌํ•จํ•œ ๋ฌธ์„œ(passage)๋ฅผ ์ฐพ๋Š” ๊ฒƒ
image

Overview

query์™€ passage๋ฅผ ์ž„๋ฒ ๋”ฉํ•œ ๋’ค ์œ ์‚ฌ๋„๋กœ ๋žญํ‚น์„ ๋งค๊ธฐ๊ณ , ๊ฐ€์žฅ ๋†’์€ passage ์„ ํƒ
image

Sparse Embedding

0์ด ๋Œ€๋ถ€๋ถ„์ธ vector๋กœ embedding

  • Bag-of-Words: ๋‹จ์–ด๊ฐ€ ์กด์žฌํ•˜๋ฉด 1, ์—†์œผ๋ฉด 0(n-gram์„ ์‚ฌ์šฉํ•˜๊ธฐ๋„ ํ•จ)
    ๋‹จ์–ด๊ฐ€ ๋งŽ์•„์งˆ์ˆ˜๋ก space ์ฆ๊ฐ€, n-gram์˜ n ์ปค์งˆ์ˆ˜๋ก ์ฆ๊ฐ€
  • TF-IDF(Term Frequency - Inverse Document Frequency): ๋‹จ์–ด์˜ ๋“ฑ์žฅ ๋นˆ๋„์™€ ๋‹จ์–ด๊ฐ€ ์ œ๊ณตํ•˜๋Š” ์ •๋ณด์˜ ์–‘์œผ๋กœ ๊ฐ€์ค‘์น˜ ๋ถ€์—ฌ
    TF: ๋‹จ์–ด ๋“ฑ์žฅ ํšŸ์ˆ˜ / word ์ˆ˜(or 1)
    IDF: log(document ์ˆ˜ / ๋‹จ์–ด๊ฐ€ ๋“ฑ์žฅํ•œ document์˜ ์ˆ˜)
    ์ฟผ๋ฆฌ์™€ ๋ฌธ์„œ์˜ TF-IDF ๋‚ด์ ์œผ๋กœ ์œ ์‚ฌ๋„๋ฅผ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ์Œ
    image โ€ป BM25: TF-IDF๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๋ฌธ์„œ์˜ ๊ธธ์ด๊นŒ์ง€ ๊ณ ๋ คํ•œ scoring
    image

ํ•œ๊ณ„

์ฐจ์›์˜ ์ˆ˜๊ฐ€ ๋งค์šฐ ํผ
๋Œ€๋ถ€๋ถ„์˜ element๊ฐ€ 0์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ๋น„ํšจ์œจ์ 
term ๊ฐ„์˜ ์œ ์‚ฌ์„ฑ์„ ๊ณ ๋ คํ•˜์ง€ ๋ชปํ•จ

Dense Embedding

์ž‘์€ ์ฐจ์›์˜ ๊ณ ๋ฐ€๋„ ๋ฒกํ„ฐ
๊ฐ ์ฐจ์›์ด ํŠน์ • term์„ ๊ฐ€๋ฆฌํ‚ค์ง€ ์•Š์œผ๋ฉฐ ๋Œ€๋ถ€๋ถ„์˜ ์š”์†Œ๊ฐ€ non-zero
๋‹จ์–ด์˜ ์œ ์‚ฌ์„ฑ์„ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ์Œ

Dense Encoder

๋ณดํ†ต pretrained BERT encoder๋ฅผ ์‚ฌ์šฉ
๋ฌธ์„œ ์ „์ฒด๋ฅผ ํ•˜๋‚˜์˜ vector(CLS ํ† ํฐ์˜ output)๋กœ ๋‚˜ํƒ€๋‚ด์–ด question์—์„œ ๋‚˜์˜จ vector์™€ ๋น„๊ต
image

์—ฐ๊ด€๋œ passage(positive sampling)์™€๋Š” ๊ฑฐ๋ฆฌ๋ฅผ ์ขํžˆ๊ณ 
์—ฐ๊ด€๋˜์ง€ ์•Š์€ passage(negative sampling)์™€๋Š” ๊ฑฐ๋ฆฌ๊ฐ€ ๋ฉ€์–ด์•ผ ํ•จ
image

โ–  Objective function: Positive passage์— ๋Œ€ํ•œ negative log likelihood loss
image

MIPS(Maximum Inner Product search)

๋‚ด์ ์ด ๊ฐ€์žฅ ํฐ(์œ ์‚ฌ์„ฑ์ด ํฐ) vector๋ฅผ ์ฐพ๋Š” ๊ฒƒ
๋ชจ๋“  passage embedding์— ๋Œ€ํ•ด์„œ ๋‚ด์  ๊ตฌํ•˜๋Š”(brute-force) ๊ฒƒ์€ ๋งค์šฐ ๋น„ํšจ์œจ์ 

  • compression: Scalar Quantization(SQ)
    vector๋ฅผ ์••์ถ•ํ•˜์—ฌ ๋ฉ”๋ชจ๋ฆฌ ์ ˆ์•ฝ(4byte float -> 1byte unsigned integer)
  • pruning: Inverted File(IVF)
    search space๋ฅผ ์ค„์—ฌ ์†๋„ ๊ฐœ์„ (clustering + inverted file)
    ์ „์ฒด vector space๋ฅผ k๊ฐœ์˜ cluster๋กœ ๋‚˜๋ˆ„๊ณ  ๊ทธ ์•ˆ์—์„œ search
    inverted file: ๊ฐ cluster์˜ ์œ„์น˜์™€ ์ด์— ์†ํ•˜๋Š” vector๋“ค์˜ ์ •๋ณด

โ€ป FAISS: Facebook์—์„œ ๋งŒ๋“  efficient similarity search ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ

Open-Domain Question Answering(ODQA)

Open-domain: supporting evidence๊ฐ€ ์กด์žฌํ•˜์ง€ ์•Š๋Š” ์งˆ๋ฌธ์— ๋Œ€ํ•ด ๋‹ต์„ ํ•จ
web์ด๋‚˜ wiki์—์„œ ์ •๋ณด๋ฅผ ์ฐพ์•„์„œ ๋‹ต์„ ํ•˜๋Š” ๋ฌธ์ œ
image

Retriever-Reader Approach

Retriever: ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์—์„œ ๊ด€๋ จ ๋ฌธ์„œ๋ฅผ ๊ฒ€์ƒ‰
Reader: ๊ฒ€์ƒ‰๋œ ๋ฌธ์„œ์—์„œ ์งˆ๋ฌธ์— ํ•ด๋‹นํ•˜๋Š” ๋‹ต์„ ์ฐพ์•„๋ƒ„

  • knowledge source: ๊ตฌ์กฐํ™”๋˜์ง€ ์•Š์€ ๋ฌธ์„œ๋กœ ์ด๋ฃจ์–ด์ง„ corpus(์œ„ํ‚คํ”ผ๋””์•„ ๋“ฑ)
  • Distant supervision: ์งˆ๋ฌธ-๋‹ต๋ณ€๋งŒ ์žˆ๋Š” ๋ฐ์ดํ„ฐ์…‹์—์„œ MRC ํ•™์Šต ๋ฐ์ดํ„ฐ ๋งŒ๋“ฆ

์ถ”๋ก 

retriever๊ฐ€ ์งˆ๋ฌธ๊ณผ ๊ฐ€์žฅ ๊ด€๋ จ์„ฑ ๋†’์€ 5๊ฐœ ๋ฌธ์„œ ์ถœ๋ ฅ
reader๋Š” 5๊ฐœ ๋ฌธ์„œ๋ฅผ ์ฝ๊ณ  ๋‹ต๋ณ€์„ ์˜ˆ์ธก
๋‹ต๋ณ€ ์ค‘ ๊ฐ€์žฅ score๊ฐ€ ๋†’์€ ๊ฒƒ์„ ์ตœ์ข… ๋‹ต์œผ๋กœ ์‚ฌ์šฉ

Reducing Bias

training์— SQuAD์ฒ˜๋Ÿผ context์— ๋ฌด์กฐ๊ฑด ๋‹ต์ด ์žˆ๋Š” ๋ฐ์ดํ„ฐ๋งŒ ์กด์žฌํ•œ๋‹ค๋ฉด...
training์˜ ์ฃผ์ œ์™€ text์˜ ์ฃผ์ œ๊ฐ€ ๋งŽ์ด ๋‹ค๋ฅด๋‹ค๋ฉด...

  • train negative examples: ํ›ˆ๋ จํ•  ๋•Œ ์ž˜๋ชป๋œ ์˜ˆ์‹œ๋„ ๋ณด์—ฌ์คŒ
    corpus ๋‚ด์—์„œ ๋žœ๋คํ•˜๊ฒŒ ๋ฝ‘๊ธฐ, ์ข€ ๋” ํ—ท๊ฐˆ๋ฆฌ๋Š” negative ์ƒ˜ํ”Œ ๋ฝ‘๊ธฐ

  • add [no answer] bias: ์ž…๋ ฅ ์‹œํ€€์Šค์— 1๊ฐœ ํ† ํฐ์„ ๋”ํ•ด์„œ, start-end ํ™•๋ฅ ์ด ํ•ด๋‹น ํ† ํฐ์„ ๊ฐ€๋ฆฌํ‚ค๋ฉด no answer์„ ์ถœ๋ ฅํ•˜๋„๋ก

Anotation Bias

๋ฐ์ดํ„ฐ ์ œ์ž‘ ๋‹จ๊ณ„์—์„œ์˜ bias
์งˆ๋ฌธ์„ ํ•˜๋Š” ์‚ฌ๋žŒ์ด ๋‹ต์„ ์•Œ๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์งˆ๋ฌธ๊ณผ evidence ๋ฌธ๋‹จ ์‚ฌ์ด์— ๋งŽ์€ ๋‹จ์–ด๊ฐ€ ๊ฒน์น˜๋Š” bias ๋ฐœ์ƒ
ํ•™์Šต ๋ฐ์ดํ„ฐ์˜ ๋ถ„ํฌ ์ž์ฒด๊ฐ€ bias(์œ ๋ช…ํ•œ wiki, article ๋“ฑ..)

Closed-book QA

๋Œ€๋Ÿ‰์˜ ์ง€์‹ ์†Œ์Šค๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์‚ฌ์ „ํ•™์Šต๋œ ์–ธ์–ด ๋ชจ๋ธ์ด ๊ทธ ์ง€์‹์„ ๊ธฐ์–ตํ•˜๊ณ  ์žˆ์„ ๊ฒƒ์ด๋ผ ๊ฐ€์ •
search ๊ณผ์ • ์—†์ด ๋ฐ”๋กœ ์ •๋‹ต ์ƒ์„ฑ

T5(Text-to-Text Format)

input์— task ์ •์˜๊ฐ€ ๊ฐ™์ด ๋“ค์–ด๊ฐ
image T5์— QA๋ฅผ fine-tuning ์ ์šฉํ–ˆ๋”๋‹ˆ ์ž˜ ํ•˜๋”๋ผ..