Analysis Engine: LIST Questions - 11791-04/project-team04 GitHub Wiki
Algorithm:
- Identify pivot terms in the query, using unigram heuristics
- Locate pivot terms in abstract, extract text in a window of fixed size.
- Run BioNER on the extracted text.
- Limit to unigram entities for the moment.
- Wipeout recognized entities with unigram heuristics.
Advantage:
- This method is robust and its assumption holds.
- smaller window size -> higher P, lower R
- shorter list -> higher P, lower R
- Robust to overfitting
- Efficient
To be improved:
- Bigram LM for bigram List entities.
- Need NLU module to understand what the question is looking for.
- Need ontology corpus for concept matching