Analysis Engine: Document - 11791-04/project-team04 GitHub Wiki

Work Flow:

  • reads the question in raw String from the Question Reader
  • use a unigram model to wipeout some common terms that are not meaningful
  • queries the PubMed API to obtain candidate documents
  • process each document (title and abstract) and query by removing punctuation, stoppers and perform Krovetz Stemming
  • rank documents based on their similarity to the query.
  • Several rankers were tried, including Okapi BM25, Indri, Dirichlet and XQL.
  • The results are written to jcas by Document.

Outline:

descriptorimpl.DocumentRetrieval_AE

  • service : WebAPIServiceProxy
  • stemmer : KrovetzStemmer
  • outQuestions : PrintWriter
  • baseline : boolean
  • conceptSet : Set<String>
  • initialize(UimaContext)
  • qeWithConcept(String)
  • process(JCas)
  • collectionProcessComplete()

descriptorimpl.DocumentRetrieval_AE.DocScoreComparator

  • compare(Pair<DocInfo, Double>, Pair<DocInfo, Double>)
⚠️ **GitHub.com Fallback** ⚠️