What is S3QA? - QAML/S3QACoreFramework GitHub Wiki
Overview
S3QA is a UIMA framework that allows for the the processing of paired and threaded items in order to (a) compute features, (b) compute parse-tree representations, and (c) learn on them. S3QA provides large-scale deep linguistic analysis and kernel technology to large datasets.
S3QA is rather flexible and can be extended by incorporating new linguistic components and machine learning components and algorithms. In its current inception, S3QA focuses on the so-called community question answering and it can handle two tasks:
- given a forum question and a thread of its associated comments, rank the comments according to their relevance against the question
- given a fresh user question and a set of previously-posted forum questions, rank the forum questions according to their relevance against the fresh question.
In addition to support state-of-the-art community question answering models, S3QA also enables multilingual (e.g., Arabic) pipelines.
Representations
The features are the ones used in ConvKN at SemEval-2016 Task 3: Answer and Question Selection for Question Answering on Arabic and English Fora
Vectorial Features
Through dkpro, S3QA can compute a manifold of similarities between a pair of texts including:
- Greedy string tiling
- Levenshtein distance
- Longest common subsequence
- word n-gram cosine
Parse trees
As S3QA takes advantage of kelp to perform learning, in can use sophisticated kernels, such as the tree kernel. S3QA produces parse trees with Stanford's coreNLP and feeds them into the kelp for learning, beside the aforementioned features.
Learning Algorithms
By relying on kelp, S3QA supports a number of kernel-based learning models (and their combination). Such kernels include the standard linear, polynomial, and RBF kernels. But the interesting stuff starts when using more sophisticated kernels, such as the partial tree kernel, which is applied on parse trees in our case.