What is S3QA? - QAML/S3QACoreFramework GitHub Wiki

Overview

S3QA is a UIMA framework that allows for the the processing of paired and threaded items in order to (a) compute features, (b) compute parse-tree representations, and (c) learn on them. S3QA provides large-scale deep linguistic analysis and kernel technology to large datasets.

S3QA is rather flexible and can be extended by incorporating new linguistic components and machine learning components and algorithms. In its current inception, S3QA focuses on the so-called community question answering and it can handle two tasks:

given a forum question and a thread of its associated comments, rank the comments according to their relevance against the question
given a fresh user question and a set of previously-posted forum questions, rank the forum questions according to their relevance against the fresh question.

In addition to support state-of-the-art community question answering models, S3QA also enables multilingual (e.g., Arabic) pipelines.

Representations

The features are the ones used in ConvKN at SemEval-2016 Task 3: Answer and Question Selection for Question Answering on Arabic and English Fora

Vectorial Features

Through dkpro, S3QA can compute a manifold of similarities between a pair of texts including:

Greedy string tiling
Levenshtein distance
Longest common subsequence
word n-gram cosine

Parse trees

As S3QA takes advantage of kelp to perform learning, in can use sophisticated kernels, such as the tree kernel. S3QA produces parse trees with Stanford's coreNLP and feeds them into the kelp for learning, beside the aforementioned features.

Learning Algorithms

By relying on kelp, S3QA supports a number of kernel-based learning models (and their combination). Such kernels include the standard linear, polynomial, and RBF kernels. But the interesting stuff starts when using more sophisticated kernels, such as the partial tree kernel, which is applied on parse trees in our case.