Know What You Don't Know - USC-LHAMa/CSCI544_Project GitHub Wiki

Introduction

The problem with SQuAD is that it focuses on questions where a correct answer is guaranteed to exist in the document, but models really only need to select the span that seems most related to the question. In this paper, SQuAD 2.0 is used which combines answerable questions from 1.1 with 53,775 new, unanswerable questions about the same paragraphs. These new questions were created by making sure that they are relevant to the paragraph and that the paragraph contains a plausible answer.

Experiments

This experiment evaluated three model architectures which were BiDAF-No-Answer (BNA) model, and two version of the DocumentQA No-Answer (DoCQA) model. All three models learn to predict probability that a question is unanswerable as well as prodvides a distribution over answer choices.

Results

All three models were trained and tests on SQuAD 2.0. The best model was DocQA + ELMo which got a 66.3 F1 score on the test set which was 23.2 points lower than the human accuracy F1 score. SQuAD 2.9 is a much harder dataset for existing models. Automatically generated negative examples are easier for existing models to detect. Plausible answers serve as affective distractors.