Install and use Solr 6.5 with CKAN - ckan/ckan GitHub Wiki

Overview

In this tutorial we are going to install Solr 6.5 on our server (no compilation required!) and set up a core for CKAN.

Solr 6.5 comes with a nice script for install the package in the system, it detect the OS and eventually creates the solr user for the daemon.

Java version

Check you have Java 8 (also known as Java 1.8). Earlier versions are incompatible with Solr 6. Java 9 and later are not recommended by SOLR and cause errors.

java -version

If you are on Ubuntu 18.04 it comes with Java 8 and Java 11, defaulting to 11. So get rid of 11:

sudo apt-get remove openjdk-11-jre-headless
java -version
    openjdk version "1.8.0_242"
    OpenJDK Runtime Environment (build 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08)
    OpenJDK 64-Bit Server VM (build 25.242-b08, mixed mode)

Get rid of any other running copies of SOLR

If you installed solr-jetty you need to uninstall it. e.g. for Ubuntu 18.04 (bionic):

sudo apt-get remove solr-jetty jetty9

Check nothing is listening on the port any more:

sudo netstat -peanut | grep 8983

Download and Install Solr

Go to the Apache Solr website and download Solr, in this tutorial we are going to use the .tgz format, but it works with .zip as well.

Download the package from the web and put it somewhere in your file-system, we are going to use the installer soon

cd /tmp
wget https://archive.apache.org/dist/lucene/solr/6.5.1/solr-6.5.1.tgz

When the download is finished, unzip the package's install script:

tar xzf solr-6.5.1.tgz solr-6.5.1/bin/install_solr_service.sh --strip-components=2

SOLR's install script: install_solr_service.sh will:

  • copy SOLR's files to /opt/solr-6.5.1
  • creates a symlink to it from /opt/solr
  • creates the 'solr' user
  • installs SOLR as a service (puts the init script under /etc/init.d/solr)
  • runs this service in the background, listening on port 8983.

Run the install script with the default settings:

sudo bash ./install_solr_service.sh solr-6.5.1.tgz

Or you can customize the service name, installation directories, port, and owner using options passed to the installation script. To see available options, simply do:

sudo bash ./install_solr_service.sh -help

Check it started ok:

sudo service solr status
    o solr.service - LSB: Controls Apache Solr as a Service
       Loaded: loaded (/etc/init.d/solr; generated)
       Active: active (exited) since Fri 2020-02-07 11:36:20 UTC; 20s ago
         Docs: man:systemd-sysv-generator(8)

Create and configure the ckan core

Switch to the solr user and go to the bin directory:

sudo su solr
cd /opt/solr/bin

Now create the ckan core:

./solr create -c ckan

If this command fails because "Failed to determine the port of a local Solr instance, cannot create ckan!" then it's because the main 'solr' service has errors. Check logs for errors: /var/solr/logs/solr.log and /var/solr/logs/solr-8983-console.log

When successful it will have created all the configuration files and directories. At this point, we can see the core listed in our solr admin http://localhost:8983/solr/ and we can proceed to edit the configuration files

cd /var/solr/data/ckan/conf

Now edit solrconfig.xml to make it compatible with SOLR6 syntax. Simply run these commands:

sed -i '/<config>/a <schemaFactory class="ClassicIndexSchemaFactory"/>' solrconfig.xml
sed -i '/<initParams path="\/update\/\*\*">/,/<\/initParams>/ s/.*/<!--&-->/' solrconfig.xml
sed -i '/<processor class="solr.AddSchemaFieldsUpdateProcessorFactory">/,/<\/processor>/ s/.*/<!--&-->/' solrconfig.xml

which will do 3 things:

  1. Insert the following line into the root <config> element:

    <schemaFactory class="ClassicIndexSchemaFactory"/>
    
  2. Delete this element:

    <initParams path="/update/**">
      <lst name="defaults">
        <str name="update.chain">add-unknown-fields-to-the-schema</str>
      </lst>
    </initParams>
    
  3. Also delete this element:

    <processor class="solr.AddSchemaFieldsUpdateProcessorFactory">
      <str name="defaultFieldType">strings</str>
      <lst name="typeMapping">
        <str name="valueClass">java.lang.Boolean</str>
        <str name="fieldType">booleans</str>
      </lst>
      <lst name="typeMapping">
        <str name="valueClass">java.util.Date</str>
        <str name="fieldType">tdates</str>
      </lst>
      <lst name="typeMapping">
        <str name="valueClass">java.lang.Long</str>
        <str name="valueClass">java.lang.Integer</str>
        <str name="fieldType">tlongs</str>
      </lst>
      <lst name="typeMapping">
        <str name="valueClass">java.lang.Number</str>
        <str name="fieldType">tdoubles</str>
      </lst>
    </processor>
    

Next, remove the managed-schema file:

rm managed-schema

And copy or symlink the schema.xml from CKAN:

cp /somewhere/over/the/rainbow/ckan/conf/solr/schema.xml .

or

ln -s /usr/lib/ckan/default/src/ckan/ckan/config/solr/schema.xml schema.xml

Exit from being the 'solr' user:

exit

Finally, restart solr

/etc/init.d/solr restart

or

sudo service solr restart

Check there are no errors when you browse: http://localhost:8983/solr/#/ckan

In addition you can ask it to list the cores from the command-line:

curl -s http://localhost:8983/solr/admin/cores?action=STATUS |     python -c 'import sys;import xml.dom.minidom;s=sys.stdin.read();print(xml.dom.minidom.parseString(s).toprettyxml())'
<?xml version="1.0" ?>
<response>


	<lst name="responseHeader">
		<int name="status">0</int>
		<int name="QTime">0</int>
	</lst>
	<lst name="initFailures"/>
	<lst name="status">
		<lst name="ckan">
			<str name="name">ckan</str>
			<str name="instanceDir">/var/solr/data/ckan</str>
			<str name="dataDir">/var/solr/data/ckan/data/</str>
			<str name="config">solrconfig.xml</str>
			<str name="schema">schema.xml</str>
			<date name="startTime">2020-02-21T10:32:06.953Z</date>
			<long name="uptime">17615792</long>
			<lst name="index">
				<int name="numDocs">1</int>
				<int name="maxDoc">1</int>
				<int name="deletedDocs">0</int>
				<long name="indexHeapUsageBytes">-1</long>
				<long name="version">2736</long>
				<int name="segmentCount">1</int>
				<bool name="current">true</bool>
				<bool name="hasDeletions">false</bool>
				<str name="directory">org.apache.lucene.store.NRTCachingDirectory:NRTCachingDirectory(MMapDirectory@/var/solr/data/ckan/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@5780f666; maxCacheMB=48.0 maxMergeSizeMB=4.0)</str>
				<str name="segmentsFile">segments_di</str>
				<long name="segmentsFileSizeInBytes">167</long>
				<lst name="userData">
					<str name="commitTimeMSec">1582295752835</str>
				</lst>
				<date name="lastModified">2020-02-21T14:35:52.835Z</date>
				<long name="sizeInBytes">10402</long>
				<str name="size">10.16 KB</str>
			</lst>
		</lst>
	</lst>


</response>

Config ini file

Make sure your solr_url in ckan.ini or development.ini or production.ini is pointing to the ckan core (the default value is for a single core setup):

solr_url = http://127.0.0.1:8983/solr/ckan
⚠️ **GitHub.com Fallback** ⚠️