Creating Solr Index from Scratch - wkiri/MTE GitHub Wiki

Notes:

The parser-indexer-py is hosted at https://github.com/USCDataScience/parser-indexer-py
This page describes how to setup a Solr index for the first time. See also Add new documents to Solr index.

Setup Solr

Download Solr

mkdir workspace && cd workspace
wget http://archive.apache.org/dist/lucene/solr/6.1.0/solr-6.1.0.tgz
tar xvzf solr-6.1.0.tgz
cd solr-6.1.0

Start and Create a Core

PORT=8983
bin/solr start -p $PORT
bin/solr create_core -c docs -d $YOUR_PATH/conf/solr/docs -p $PORT

To confirm solr setup completion, visit http://<host>:8983/solr/

Refer to the README in parser-server directory for setting up parser server (or see https://github.com/USCDataScience/parser-indexer-py/tree/master/parser-server).
When the parser server is running on http://localhost:9998/ follow the below steps:

Download and Start Stanford CoreNLP Server on port :9000

Visit http://stanfordnlp.github.io/CoreNLP/download.html, download the zip. Extract the zip. Note: this runs on Java 8

pip install pycorenlp

Goto CoreNLP extracted directory and run

java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer

You can test it by going here: http://localhost:9000/

Tips:

To restart, kill the service and start “nohup corenlpserver.sh &”
To change the NER model – edit ‘ner.model’ in $CORENLP_HOME/StanfordCoreNLP.properties
If needed, select "English" in the web interface for the language.