Candidate Answer Retrieval - GeeUnit/hw5-team07 GitHub Wiki

We implemented and tried the following two matchers to create the solr query. First one is the matcher we are using in our pipeline

QuestionAnswerCandSentSimilarityMatcher: The baseline QuestionCandSentSimilarityMatcher matcher forms a query for each question and picks the top k candidate sentences from the result. The intuition behind the QuestionAnswerCandSentSimilarityMatcher is to capture the notion of multiple-choice answers. This matcher forms the solr query in the form of a question and an answer choice. Therefore, instead of having 1 query for each question, it creates m query for each question where m is the number of multiple-choice answers for that question. As we expected, for some of the questions we get the better candidate sentences which leads to a better result.
QuestionAllAnswerCandSentSimilarityMatcher: The relevant score assigned to each candidate sentences as a result of solr query was not directly comparable to the candidate sentences of other queries. For example, the relevant score of candidate sentences for “question1+answerChoice1” and “question1+answerChoice2” are not directly comparable. Therefore, we though it is worth to form the solr query as a question and all the multiple-choice answers. This matcher forms such a query for each question. The matcher decreased the overall performance, but improved the accuracy for a couple of documents. This matcher is not being used in our pipeline.

We also tried different ways to build the nounPhrase, with different POS tags, but it didn't improve the current system. It requires more sophisticated heuristics and strategy to form the proper nounPhrase for the query. For instance, it might be helpful to determine the type of the questions and form a different query according to the type.