Creating Solr Index from Scratch - wkiri/MTE GitHub Wiki
Notes:
- The
parser-indexer-py
is hosted at https://github.com/USCDataScience/parser-indexer-py - This page describes how to setup a Solr index for the first time. See also Add new documents to Solr index.
Download Solr
mkdir workspace && cd workspace
wget http://archive.apache.org/dist/lucene/solr/6.1.0/solr-6.1.0.tgz
tar xvzf solr-6.1.0.tgz
cd solr-6.1.0
Start and Create a Core
PORT=8983
bin/solr start -p $PORT
bin/solr create_core -c docs -d $YOUR_PATH/conf/solr/docs -p $PORT
To confirm solr setup completion, visit http://<host>:8983/solr/
Refer to the README in parser-server
directory for setting up parser server (or see https://github.com/USCDataScience/parser-indexer-py/tree/master/parser-server).
When the parser server is running on http://localhost:9998/
follow the below steps:
Download and Start Stanford CoreNLP Server on port :9000
Visit http://stanfordnlp.github.io/CoreNLP/download.html, download the zip. Extract the zip. Note: this runs on Java 8
Follow instructions in https://github.com/smilli/py-corenlp
pip install pycorenlp
Goto CoreNLP extracted directory and run
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer
You can test it by going here: http://localhost:9000/
Tips:
- To restart, kill the service and start “nohup corenlpserver.sh &”
- To change the NER model – edit ‘ner.model’ in
$CORENLP_HOME/StanfordCoreNLP.properties
- If needed, select "English" in the web interface for the language.