Prebuilt Logistic Regression Analysis - SeanTater/uncc2014watsonsim GitHub Wiki
Once the pipeline prototype had been built, but before the pipeline had been integrated, the machine learning group started on another scorer. Instead of taking an average, it combined the Indri and Lucene scores using hand-optimized separate scales taken from Weka. The changes were finished by (and mostly before) commit ac7a6b20aa.
Results
There were measurable performance improvements, which were unexpected on account of pessimism from the machine learning group. The accuracy is clearly not amazing but were already a significant improvement from the original results:
There were 8045 questions with answers, 2992 had correct answers in the search results. Of those correct answers, 1101 were in the top 3 by rank in the final output, and 505 were the very top result.
Next Direction
Even as we speak, the live pipeline is being integrated. Expectations are high since several regions have experienced serious reworking.