Candidate Answer Scoring - GeeUnit/hw5-team07 GitHub Wiki

The input of this module is K candidate sentences. For each of the question and its answer set, we calculate the score for each answer in the answer set with respect to each candidate sentence. Different methods can be used in the score calculating. In our pipeline, three strategies were explored:

Number of Noun-phrases/Named-entities matched in answer choice and candidate sentence
Number of Synonyms matched in answer choice and candidate sentence
Point-wise Mutual Information (PMI) score between Noun-phrases/Named-entities in answer choice and candidate sentence based on the background corpus.

About PMI:

For each candidate sentence, we extract its Noun-phrase list and Named-entity list. For each Noun-phrase (NP) or Named-entity (NE) we got, we calculate the PMI score with respect to each answer choice, the formula for calculating the PMI score is:

                          [images/PMI_formula.png](/GeeUnit/hw5-team07/wiki/images/PMI_formula.png)

PMI is used to measure the coincidence of two entities given their joint distribution and their individual distributions. In our experiment, we found PMI scores yield a higher c@1 score than the other two methods. The explanation is a higher PMI score means the answer choice is co-occurring more with the NP/NE in the candidate sentences, which means that answer choice could be the correct choice.