Package: descriptorimpl - 11791-04/project-team04 GitHub Wiki

This package includes implementations for all the UIMA descriptors (CollectionReaders, Analysis Engines, and Consumers)

The Type System - Gold Standard Answers and Retrieved Answers

Class: QuestionReader extends CollectionReader_ImplBase

The QuestionReader extends CollectionReader_ImplBase reads the annotated corpus from /BioASQ-SampleData1B.json and saves them into the UIMA pipeline.

Each question produces a JCas, and all the gold standard answers are in its index. The question itself (question body, question ID and question type) are stored in the Question Annotation type. Three UIMA types are used for storing the gold standard answers: TripleSearchResult, ConceptSearchResult and Document. All three are subclasses of SearchResult.

To distinguish gold standard answers from retrieved candidate answers, TypeConstants.SEARCH_ID_GOLD_STANDARD is assigned to the SearchId attribute.

Exact answers are store as the Answer type in the index. Objects with rank equals to TypeConstants.UKNOWN are the gold standards. All candidate answers should be added to the index of its corresponding CAS, and rank MUST NOT be UNKNOWN. The gold standard file from the archetype does not contain variant (synonym) lists, so it will always be an empty list.

Evaluations - Documents, Tripples, Concepts, Snippets

Class: BasicConsumer extends CasConsumer_ImplBase

The BasicConsumer class (extends CasConsumer_ImplBase) reads and evaluates the result of the pipeline against the gold standard (both stored in each CAS). The Metric class is used to calculate both MAP and GMAP scores.

The metric logic is implemented under metric.MetricDTC, metric.MetricTriples, metric.MetricSnippet.

Final Evaluation - Exact Match Answers for LIST, YESNO, FACTOID questions

Class: ExactMatchConsumer extends CasConsumer_ImplBase

The ExactMatchConsumer class (extends CasConsumer_ImplBase) reads and evaluates the result of the pipeline against the exact match gold standard (both stored in each CAS). The Metric (ExactMatchMetrics) class is used to calculate ALL metrics specified in the M3 handout: P/R/F1, struct/lenient accuracy, MRR. Additionally, we also implement soft P/R/F1 for LIST questions, so that partial matches are reflected in the final scores. Partial match scores are calculated by 2 * |g ^ a| / |g| + |a|, where "g" is a gold standard answer and "a" is a retrieved answer, and |g ^ a| is the number of overlapping tokens in "g" and "a".

The metric logic is implemented under metric.ExactMatchMetrics.

Analysis Engine - Concepts

Class: ConceptAnalysisEngine extends JCasAnnotator_ImplBase

The ConceptAnalysisEngine class queries the PubMed webservice to get the concepts for the query and saves the results in ConceptSearchResult for the consumer to process.

Analysis Engine - Triples

Class: TriplesExtractor extends JCasAnnotator_ImplBase

The TriplesExtractor class queries the PubMed webservice to get the triples for the query and saves the results using TriplesSearchResult for the consumer to process. The triples are also ranked using the score retrieved from the webservice API.

Analysis Engine - ListQuestionEntityExtractor

Class: ListQuestionEntityExtractor_AE extends JCasAnnotator_ImplBase

The ListQuestionEntityExtractor_AE class is responsible for retrieval and evaluation of the list answers for a question based on the extracted documents, using the implemented techniques.