Prebuilt Logistic Regression Analysis - SeanTater/uncc2014watsonsim GitHub Wiki

Once the pipeline prototype had been built, but before the pipeline had been integrated, the machine learning group started on another scorer. Instead of taking an average, it combined the Indri and Lucene scores using hand-optimized separate scales taken from Weka. The changes were finished by (and mostly before) commit ac7a6b20aa.

Results

There were measurable performance improvements, which were unexpected on account of pessimism from the machine learning group. The accuracy is clearly not amazing but were already a significant improvement from the original results: Prebuilt Linear Analysis Breakdown

There were 8045 questions with answers, 2992 had correct answers in the search results. Of those correct answers, 1101 were in the top 3 by rank in the final output, and 505 were the very top result.

Next Direction

Even as we speak, the live pipeline is being integrated. Expectations are high since several regions have experienced serious reworking.