Q&A Architecture - GeeUnit/hw5-team07 GitHub Wiki
The Q&A architecture for our project is shown below. This was adopted from Building an Optimal Question Answering System Automatically using Configuration Space Exploration (CSE) for QA4MRE 2013 Tasks (Patel et al 2013).
As seen in the diagram above, the architecture depends on the Apache Solr Project for storing document sentences.
The workflow is as follows:
- Documents stored as XML files are read into an annotation pipeline. Each document consists of an Alzheimer's related article and a question and answer set.
- Document annotations are added in a UIMA CAS object using annotators.
- A subset annotations made to the article are indexed into Solr.
- The entire UIMA CAS object is output as a set of .XMI objects.
In order to retrieve answers for a set of questions:
- The .XMI objects are read into memory.
- Annotations made to the questions and answers are used to form a Solr query.
- The Solr query is used to retrieve parts of the original article from the Solr server.
- The article parts are used to score the multiple-choice questions.
- The answer selected for a question is based on the multiple-choice with the highest score
The dark blue boxes represent the modification points for team #7. Our main points of focus include:
- [The Annotation Pipeline] (https://github.com/GeeUnit/hw5-team07/wiki/Analytics-Pipeline)
- The XMI Indexer
- [The Candidate Sentence Retriever] (https://github.com/GeeUnit/hw5-team07/wiki/Candidate-Answer-Retrieval)
- The Candidate Answer Scoring Strategy
- The Candidate Answer Selection Strategy