Optional Features - SeanTater/uncc2014watsonsim GitHub Wiki
Watsonsim can also use a few features that we don't include on the front page, mostly in order to make installation less daunting:
Indri local search
Indri is a large part of the Watsonsim architecture and we enable this feature. It's disabled by default because the native library can be difficult to setup. But if you can, enable it. It will greatly improve accuracy (although it comes at a serious speed cost).
- Indri search library (native), needs to be compiled with SWIG and Java, afterward copy the libindri_jni.* into uncc2014watsonsim/lib
- To skip this step, set
indri_enabled = true
in config.properties.
Bing web search
Bing has a significant influence on the recall of the simulator because it has access to vastly more data than we can upload and distribute. Keep in mind queries are limited to 5000 per month for free accounts.
To install it:
- Copy config.properties.sample to config.properties
- Create an Azure account and sign up for Bing, put the API key the right variable in the config.
Google web search
Google web search development was discontinued several months ago but it may yet revive. The reason was just the very low burst rate for the query quota: 100 per day. So it would take us 70 days to complete a testing/training run, whereas with Bing we can (luckily) complete it in a day or two as long as it straddles a month boundary. In the past you would do the following but it is not likely to work right now:
- Make a new Google cloud app, and put the name in the config
- Enable the Custom Search API, create a server public API key, and paste it into the config
- Make your own custom search engine
- Choose "Search any site" (but you have to pick a domain, maybe wikipedia.org would be good)
- Edit the custom search you just made. In "Sites to search", change "Search only included sites" to "Search the entire web but emphasize included sites"
- Get the search engine ID, put it in the config
Additional scripts
Some of our scripts are written in Python and are there more for reference than public use. For the scripts, which you do not need for simple queries, you should install the following:
- Python 2.6+
- psycopg2, which you can install with
pip install psycopg2
, or as python-psycopg2 in ubuntu and fedora