Answer Pruning - GeeUnit/hw5-team07 GitHub Wiki
Answer pruning was a method we used to remove answers from scoring consideration. Although it is impossible to always know the correct answer for a given question before reading a passage, it is sometimes possible to identify an incorrect question. For our system, we developed two different answer pruners.
Based on the structure of any given sentence, we can rule out certain answers based on their part-of-speech tags. For example, for "How many...?"-type questions, we can rule out all answers that don't correspond to numeric quantities.
In some cases words in a question have very little relation to the answers. For example, if a question is asking about a "gene", and one of the answers happens to be "cheeseburger", we can almost immediately rule it out, as "gene", and "cheeseburger", have very little semantic similarity.
In order to accomplish this, we used the DISCO library. We measured the second-order semantic similarity (SOS) of noun phrases located in answers and questions. We found that in many cases we were able to prune a significant amount of incorrect answers by setting some minimum threshold for answers. However, identifying an ideal threshold is a difficult task, and because of this, co-occurrence pruning may result in overfitting.