On the Generation of Medical Dialogs for COVID 19 - Songwooseok123/Study_Space GitHub Wiki

On the Generation of Medical Dialogs for COVID-19

[Motivation]

Covid ์œ ํ–‰ โ†’ consult doctors/professional ๋ถ€์กฑํ•จ.

[Contribution]

  • medical dialog system that can provide COVID19-related consultations
  • To alleviate overfitting, we develop a multi-task learning approach, which regularizes the data-deficient dialog generation task with a masked token prediction task

[Datasets]

image

[Method]

process a set of pairs ${(s_i,t_i)}$

  • target $t_i$ is a response from the doctor
  • source $s_i$ is the conversation history โ€“ the concatenation of all utterances (from both patient and doctor) before $t_i$

๋ชจ๋ธ: BART Encoder( s ์ธ์ฝ”๋”ฉํ•˜๊ณ ) +decoder( t๋ฅผ ๋ฑ‰๋Š” ์• )

  • Input : s
  • Output : t

Loss

  • the generation loss(g) + MTP loss(p)
    • (์˜ค๋ฒ„ํ”ผํŒ… ๋ง‰์„๋ผ๊ณ  MTP, masked prediction task ์ถ”๊ฐ€ํ•จ)

image

[Experiment]

Transformer

  • Encoder์— History ๋งฅ์ด๊ณ 
  • decoder๊ฐ€ ๋Œ€๋‹ต

GPT-2

  • training ๋ฐ์ดํ„ฐ : English Reddit dialogs - dialogpt ๋…ผ๋ฌธ ํ…Œ๊ทธ ๋˜์–ด์žˆ์Œ.

Untitled

Unregularized BART

  • initialized using pretrained BART
  • encoder and decoder are finetuned on CovidDialog
    • During finetuning, no self-supervised regularization is used.

Unregularized BERT-GPT

  • Encoder : initialized using pretrained BERT
  • Decoder : initialized using pretrained GPT-2
  • Fintuned on CovidDialog
    • During finetuning, no self-supervised regularization is used.

Task adaptive pretraining (TAPT)

  • encoder pretrained using BART/BERT on large-scale external corpora, it is further pretrained by predicting masked tokens on the input conversation histories in the CovidDialog datasets (without using output responses)
  • TAPT also performs maskedtoken prediction (MTP) on conversation histories. The difference is: TAPT performs the MTP task and the generation task sequentially while our method performs these two tasks jointly.

Setting

For pretrained models, we finetune them on the CovidDialog-English dataset for 5 epochs, while for the un-pretrained Transformer, we train it for 50 epochs. We set a checkpoint at the end of every epoch and finally take the one with the lowest perplexity on validation set as the final model. In response generation, for all models, we use beam search with beam width of 10 during decoding.

[Result]

ํ‰๊ฐ€์ง€ํ‘œ

  • human evaluation
    • ์˜ํ•™์  ์ •ํ™•, ๋Œ€ํ™”์™€ ๊ด€๋ จ์„ฑ, ์˜ํ•™์  ์ •๋ณด๋Ÿ‰, ์„๋งˆ๋‚˜ ์˜์‚ฌ๊ฐ™์€์ง€
    • Five medical students ์˜คโ€ฆํ•™์ƒ ๋‹ค์„ฏ๋ช…๋ฐ–์— ์•ˆ ์ผ๋„ค..
  • Perplexity
  • NIST-n, BLEU-n, METEOR : ๊ธฐ๊ณ„๋ฒˆ์—ญ ํ‰๊ฐ€ ๋ฐฉ๋ฒ•์œผ๋กœ ์ข‹์ง€๋งŒ dialogue ์‹œ์Šคํ…œ์„ ํ‰๊ฐ€ํ•˜๊ธฐ์—” reliable ํ•˜์ง€ ์•Š๋‹ค.
  • Entropy-n, Dist-n : ๋‹ค์–‘์„ฑ์„ ํ‰