How to install CKAN 2.9.1 on EC2 instance (RHEL 7.7, Solr 7.7, PostgreSQL 11, using conda) - ckan/ckan GitHub Wiki

just wanna share my complete installation steps on installing the configuration above, which i think is useful if anyone is concerned about using EOL versions of software, since i've learned much from this community. I used conda environment because it is more familiar to me. Also, if there is any steps that i could have done differently or is unnecessary, appreciate any inputs, since I'm still learning.

I wanted to install Solr8.4 using the PR that @smotornyuk created at https://github.com/ckan/ckan/pull/5143, but i couldn't figure out some of the steps described.

I also installed Nginx, but it isn't really needed (I installed it because I'm trying to replicate a office setup that blocks traffic on port 5000)

Solr7 PR can be found here: https://github.com/ckan/ckan/pull/4387, also by @smotornyuk

EC2 instance details:

Amazon Machine Image: Linux/Unix, Red Hat Enterprise Linux 7.7_HVM
Size: T3.medium
Configuration: everything else default except security groups, which actually isn't really necessary
Inbound rules: ports 80, 22, 8983
Outbound rules: default

You will start the installing step as ec2-user (default role when you SSH into the instance). Where change in user account is needed I will highlight.

Installing Firewalld, Redis and Nginx (Nginx is optional)

sudo yum install -y firewalld && sudo systemctl start firewalld && sudo systemctl enable firewalld && sudo systemctl status firewalld
sudo yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm && sudo yum -y install https://rpms.remirepo.net/enterprise/remi-release-7.rpm
sudo yum install -y redis --enablerepo=remi
sudo systemctl enable redis && sudo systemctl start redis

(Optional) Installing Nginx and editing the Nginx configuration file

The rest of the steps should be fine even if u skipped the Nginx step. What I did here was to make Nginx route port 80 requests to port 5000 that CKAN runs on, to bypass workplace constraints. If you have no such constraints, go ahead and skip this part.
sudo yum install -y nginx && sudo systemctl enable nginx && sudo systemctl start nginx sudo vi /etc/nginx/nginx.conf
Inside the nginx.conf, I removed the snippet
========= remove this segment below inside nginx.conf =========
server {
listen 80 default_server;
listen [::]:80 default_server;
server_name _;
root /usr/share/nginx/html;

# Load configuration files for the default server block.
include /etc/nginx/default.d/*.conf;

location / {
}

error_page 404 /404.html;
location = /404.html {
}

error_page 500 502 503 504 /50x.html;
location = /50x.html {
}
}
========= remove this segment above =========
and insert the following snippet in its place:
========= insert this segment instead (indenting isn't important in Nginx config) =========
server {
listen 80;
location / {
proxy_pass http://127.0.0.1:5000;
}
}
========= end of segment =========
reload nginx and firewalld:
sudo nginx -t && sudo systemctl reload nginx
sudo firewall-cmd --permanent --zone=public --add-service=http && sudo firewall-cmd --permanent --zone=public --add-service=https && sudo firewall-cmd --reload && sudo setsebool -P httpd_can_network_connect=1

Install required packages:

Honestly I'm not sure why some of the packages are needed, but i just followed the instructions here:
https://github.com/ckan/ckan/wiki/How-to-Install-CKAN-2.9-on-CentOS-7

sudo yum install -y wget policycoreutils-python git-core java-1.8.0-openjdk lsof gcc gcc-c++ cmake automake gmp-devel boost unzip

Installing PostgreSQL11:

sudo yum install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm
sudo yum install -y postgresql11-server postgresql11-contrib postgresql11-libs postgresql11
sudo /usr/pgsql-11/bin/postgresql-11-setup initdb && sudo systemctl enable postgresql-11 && sudo systemctl start postgresql-11

Setting up the databases in PgSQL:

I set up both DB for CKAN and datastore since I don't want to keep toggling back and forth. This is identical to steps for v9.6:
from ec2-user account:
sudo -i -u postgres--> switch to postgres account
createuser -S -D -R -P ckan_default--> set a password (passwordX)
createuser -S -D -R -P -l datastore_default--> set a password (passwordY)
createdb -O ckan_default ckan_db -E utf-8
createdb -O ckan_default datastore_db -E utf-8
exit--> back to ec2-user account

Create user CKAN:

as ec2-user:
sudo useradd -m -s /sbin/nologin -d /usr/lib/ckan -c "CKAN User" ckan
sudo chmod 755 /usr/lib/ckan && sudo mkdir -p /etc/ckan/default && sudo chown ckan /etc/ckan/ && sudo chown ckan /etc/ckan/default
You now have a account "ckan" that you can switch over from ec2-user by sudo su -s /bin/bash - ckan, no password needed.

Install Solr7.7:

as ec2-user:
cd /opt
sudo wget https://archive.apache.org/dist/lucene/solr/7.7.0/solr-7.7.0.tgz
sudo tar xzf solr-7.7.0.tgz solr-7.7.0/bin/install_solr_service.sh --strip-components=2
sudo bash ./install_solr_service.sh solr-7.7.0.tgz
sudo service solr status
sudo firewall-cmd --zone=public --add-port=8983/tcp --permanent
sudo firewall-cmd --reload

Installing python using miniconda for CKAN:

i used the base conda environment to install the packages.
from ec2-user, login to ckan:
sudo su -s /bin/bash - ckan
wget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.9.2-Linux-x86_64.sh
bash Miniconda3-py37_4.9.2-Linux-x86_64.sh -b -p $HOME/miniconda3
./miniconda3/bin/conda init
======close and reopen SSH session to enable the conda environment==========
The step to download the schema file is necessary since it is not part of the ckan release and the PR is not merged.
from ec2-user, login to ckan:
sudo su -s /bin/bash - ckan
wget https://raw.githubusercontent.com/smotornyuk/ckan/6cc90328de5f01f78b4ddacb256c57fda9e59bc0/ckan/config/solr/schema.xml-2.7
mv schema.xml-2.7 schema.xml
mkdir filestore--> location of CKAN filestore is /usr/lib/ckan/filestore
pip3 install pylons uwsgi supervisor PasteScript PasteDeploy setuptools==44.1.0
conda install -c anaconda psycopg2 -y
wget https://github.com/ckan/ckan/archive/ckan-2.9.1.zip
mkdir default
unzip -q ckan-2.9.1.zip -d ./default
mv default/ckan-ckan-2.9.1 default/ckan
cd ./default/ckan
I should probably highlight that my installation of ckan is at /usr/lib/ckan/default/ckan/ckan instead of /usr/lib/ckan/default/src/ckan/ckan that is usually stated in other documentation pages.
pip install -r requirements.txt
python setup.py develop
ckan generate config /etc/ckan/default/ckan.ini
vi /etc/ckan/default/ckan.ini
Edit the ckan.ini file in the following areas and change the password for PSQL as you created it above:

======== edit the ckan.ini =========
ckan.site_url = http://XXXXXXX (ie, the public ipv4 address of the instance)
sqlalchemy.url = postgresql://ckan_default:[email protected]/ckan_db
ckan.datastore.write_url = postgresql://ckan_default:[email protected]/datastore_db
ckan.datastore.read_url = postgresql://datastore_default:[email protected]/datastore_db
solr_url = http://127.0.0.1:8983/solr/ckan
ckan.redis.url = redis://127.0.0.1:6379/0 (remember to uncomment)
ckan.storage_path = /usr/lib/ckan/filestore
======== end of editing ckan.ini ========

exit-->exit to ec2-user
sudo su solr
cd /opt/solr/bin
./solr create -c ckan-->create core for CKAN
cd /var/solr/data/ckan/conf
ln -s /usr/lib/ckan/schema.xml schema.xml --> copy the 2.7 schema into as the solr schema

Now, we need to make edits to solrconfig.xml for Solr v7.7:

from https://github.com/smotornyuk/ckan/blob/6cc90328de5f01f78b4ddacb256c57fda9e59bc0/doc/maintaining/installing/solr.rst
3 parts of solrconfig.xml needs updating:
vi /var/solr/data/ckan/conf/solrconfig.xml
======================edit solrconfig.xml ==========================

change 1. # add two lines after <int name"rows">10</int>

<str name="df">text</str>
<str name="q.op">AND</str>
it should look like this now:
===========start of example==================
<int name="rows">10</int>
<str name="df">text</str>
<str name="q.op">AND</str>
<!-- Change from JSON to XML format (the default prior to Solr 7.0)
<str name="wt">xml</str>
===========end of example====================

change 2. # add next line before <updateProcessor class="solr.UUIDUpdateProcessorFactory" name="uuid"/>

<schemaFactory class="ClassicIndexSchemaFactory" />
it should look like this now:
===========start of example==================
<schemaFactory class="ClassicIndexSchemaFactory" />
<updateProcessor class="solr.UUIDUpdateProcessorFactory" name="uuid"/>
===========end of example====================

change 3. change default attribute of <updateRequestProcessorChain> to false:


it should look like this now:
===========start of example==================
<!-- The update.autoCreateFields property can be turned to false to disable schemaless mode -->
<updateRequestProcessorChain name="add-unknown-fields-to-the-schema" default="${update.autoCreateFields:false}"
processor="uuid,remove-blank,field-name-mutating,parse-boolean,parse-long,parse-double,parse-date,add-schema-fields">
===========end of example====================
===============end of changes to solrconfig ===============================
from solr account:
exit--> exit to ec2-user
sudo chown solr /var/solr/data/ckan/conf/schema.xml
sudo chown solr /var/solr/data/ckan/conf/solrconfig.xml
sudo service solr restart
if u navigate to solr admin page, ie http://ec2-instance-ipv4:8983, there should not be any warning messages. CKAN core should be running normally.

We also need to make PgSQL accessible to ckan:

from ec2-user:
sudo -i -u postgres--> login as postgres user
vi /var/lib/pgsql/11/data/pg_hba.conf
====================edit pg_hba.conf ========================
pgsql needs to accept md5 for IPv4 connections and not the default setting (I think default is ident)
# IPv4 local connections:
host all all 127.0.0.1/32 md5
===================end of editing pg_hba ====================
exit--> go back to ec2-user
sudo systemctl restart postgresql-11--> necessary everytime u change pg_hba

Finishing up:

sudo su -s /bin/bash - ckan
ln -s /usr/lib/ckan/default/ckan/who.ini /etc/ckan/default/who.ini
ckan -c /etc/ckan/default/ckan.ini db init
ckan -c /etc/ckan/default/ckan.ini run --host 0.0.0.0

It has been a long installation process, but you should now have everything running.
Try visiting ckan at http://ec2-instance-ipv4 and it should load.
Try creating a dataset and then visit Solr admin page to verify that Solr works as it should.

⚠️ **GitHub.com Fallback** ⚠️