Analysis Engine: Document - 11791-04/project-team04 GitHub Wiki

Work Flow:

reads the question in raw String from the Question Reader
use a unigram model to wipeout some common terms that are not meaningful
queries the PubMed API to obtain candidate documents
process each document (title and abstract) and query by removing punctuation, stoppers and perform Krovetz Stemming
rank documents based on their similarity to the query.
Several rankers were tried, including Okapi BM25, Indri, Dirichlet and XQL.
The results are written to jcas by Document.

Outline:

descriptorimpl.DocumentRetrieval_AE

descriptorimpl.DocumentRetrieval_AE.DocScoreComparator