How to Install CKAN 1.8 on CentOS 6.2 - ckan/ckan GitHub Wiki

Tested on CentOS 6.2 with CKAN 1.8a.

See also: How to Create a CentOS Vagrant Base Box

Table of Contents

Setup Tomcat and Solr

You need to install Solr 1.4.1 (the old version that CKAN uses, matches the version provided with Ubuntu 10.04) from source. Follow the Installing Solr on Red Hat instructions.

Setup PostgreSQL

First install and start the PostgreSQL database server:

$ sudo yum install postgresql postgresql-server postgresql-devel
$ sudo service postgresql initdb
$ sudo service postgresql start
$ sudo su
root$ sudo -u postgres psql -l

The last command should print out a list of databases.

Now create the CKAN database user and database, when asked for a password enter "pass":

root$ sudo -u postgres createuser -S -D -R -P ckanuser
root$ sudo -u postgres createdb -O ckanuser ckantest

Finally, as root, edit the file at /var/lib/pgsql/data/pg_hba.conf and change the three occurrences of "ident" to "trust":

# TYPE  DATABASE    USER        CIDR-ADDRESS          METHOD

# "local" is for Unix domain socket connections only
local   all         all                               trust
# IPv4 local connections:
host    all         all         127.0.0.1/32          trust
# IPv6 local connections:
host    all         all         ::1/128               trust

Install CKAN and it's Dependencies

Install some of CKAN's dependencies that are available via yum:

$ sudo yum install mercurial python-devel libxml2-devel libxslt-devel git subversion python-babel python-psycopg2 python-lxml python-pylons python-repoze-who python-repoze-who python-repoze-who-plugins-sa python-repoze-who-testutil python-repoze-who-friendlyform python-tempita python-zope-interface

Use easy_install to install pip and then use pip to install virtualenv:

$ sudo easy_install pip
$ sudo pip install virtualenv

Create a virtual environment, activate it and install CKAN into it. For example. this will install the latest development version of CKAN:

$ virtualenv pyenv
$ . pyenv/bin/activate
(pyenv)$ pip install --ignore-installed -e 'git+https://github.com/okfn/ckan.git#egg=ckan'

Install more CKAN dependencies into the virtualenv:

(pyenv)$ pip install --ignore-installed -r pyenv/src/ckan/requires/lucid_missing.txt -r pyenv/src/ckan/requires/lucid_conflict.txt
(pyenv)$ pip install webob==1.0.8
(pyenv)$ pip install --ignore-installed -r pyenv/src/ckan/requires/lucid_present.txt 
(pyenv)$ deactivate
$ . pyenv/bin/activate
(pyenv) $

(You may be able to save some time by missing out the --ignore-installed from the last pip install command as some of those packages have already been installed by yum.)

Create the CKAN config file:

(pyenv)$ cd pyenv/src/ckan
(pyenv)$ paster make-config ckan development.ini

Edit the development.ini file and set the values of ckan.site_id and solr_url. If you followed the Installing Solr on Red Hat instructions above then you should set solr_url to:

solr_url = http://127.0.0.1:8080/solr/ckan-schema-1.4

Now create CKAN's database tables:

(pyenv)$ paster --plugin=ckan db init

Create CKAN's cache and session dirs:

mkdir data sstore

Run CKAN's Tests

You can now verify your CKAN install by running the tests:

(pyenv)$ pip install --ignore-installed -r pip-requirements-test.txt
(pyenv)$ deactivate
$ . ~/pyenv/bin/activate
(pyenv)$ nosetests ckan/tests --ckan --with-pylons=test-core.ini

Enable the synchronous_search Plugin

New datasets added via the web UI won't show up in search results or on the datasets page until you enable the synchronous search plugin in your CKAN config file (e.g. development.ini). Find the plugins line and set it to something like this:

 ckan.plugins = stats synchronous_search

Disable Firewall

By default CKAN running on port 5000 on CentOS will not be visible to the outside world because of the firewall. One way around this is to just turn the firewall off:

$ sudo service iptables stop

Deploying CKAN on a CentOS Server using Apache and mod_wsgi

Also see: Deploying CKAN on an Ubuntu Server using Apache and modwsgi

If you've followed the instructions above to install CKAN from source on CentOS, then you can test your CKAN web instance by running it with Paste. Activate your virtual environment, cd into the pyenv/src/ckan directory, and run:

 paster serve development.ini

But to deploy your CKAN site for production you should use a real web server such as Apache. Doing this on CentOS is slightly different to the Ubuntu instructions above:

1. Install Apache web server and mod_wsgi:

 $ sudo yum install httpd mod_wsgi

2. Make a directory for your CKAN instance. For example, lets create a directory at /usr/local/demo.ckan.net

3. Create a virtualenv at /usr/local/demo.ckan.net/pyenv and install CKAN in it. To install CKAN into the virtualenv follow the Install CKAN and it's Dependencies instructions above. If you haven't already done so, you should setup Tomcat, Solr and Postgres as described above as well.

4. Make a wsgi script file at /usr/local/demo.ckan.net/pyenv/bin/demo.ckan.net.py with the following contents:

 import os
 instance_dir = '/usr/local/demo.ckan.net/'
 config_file = os.path.join(instance_dir, 'pyenv/src/ckan/development.ini')
 pyenv_bin_dir = os.path.join(instance_dir, 'pyenv', 'bin')
 activate_this = os.path.join(pyenv_bin_dir, 'activate_this.py')
 execfile(activate_this, dict(__file__=activate_this))
 from paste.deploy import loadapp
 config_filepath = os.path.join(instance_dir, config_file)
 from paste.script.util.logging_config import fileConfig
 fileConfig(config_filepath)
 application = loadapp('config:%s' % config_filepath)

5. Create the following three files in the /etc/httpd/conf.d directory:

0wsgi.conf:

 LoadModule wsgi_module modules/mod_wsgi.so

demo.ckan.net.conf:

 <VirtualHost *:80>
     # WARNING: Do not manually edit this file, it is designed to be 
     #          overwritten at any time by the postinst script of 
     #          dependent packages
     Include /etc/httpd/conf.d/demo.ckan.net.common
 </virtualhost>

demo.ckan.net.common:

    # These are common settings used for both the normal and maintenance modes
    
    ServerName demo.ckan.net
    ServerAlias demo.ckan.net
    WSGIScriptAlias / /usr/local/demo.ckan.net/pyenv/bin/demo.ckan.net.py
    
    # pass authorization info on (needed for rest api)
    WSGIPassAuthorization On
    
    # Deploy as a daemon (avoids conflicts between CKAN instances)
    # WSGIDaemonProcess std display-name=std processes=4 threads=15 maximum-requests=10000
    # WSGIProcessGroup std
    
    ErrorLog /var/log/httpd/demo.ckan.net.error.log
    CustomLog /var/log/httpd/demo.ckan.net.custom.log combined

6. Set the group and permissions of CKAN's data and sstore directories. On Ubuntu the Apache web server runs with the user name "www-data", but on CentOS it runs with the username and group "apache":

 $ cd /usr/local/demo.ckan.net/pyenv/src/ckan/
 $ mkdir -p data sstore
 $ chmod g+w -R data sstore
 $ sudo chgrp -R apache data sstore

7. Edit your CKAN configuration file (e.g. development.ini, std.ini or demo.ckan.net.ini), find the line that looks like this:

 args = ("ckan.log", "a", 20000000, 9)

and change it to something like this:

 args = ("/var/log/ckan/demo.ckan.net/ckan.log", "a", 20000000, 9)

8. Create the logs directory (which you specified in your config file in the previous step) and give it the right pemissions:

 $ sudo mkdir -p /var/log/ckan/demo.ckan.net
 $ sudo chown apache /var/log/ckan/demo.ckan.net

9. Disable selinux:

 $ sudo su
 (root) $ sudo echo 0 > /selinux/enforce
 $ <ctrl-d>

If you don't do this, you'll get misleading permissions errors when you try to start Apache.

TODO: Make it work without disabling selinux.

10. At this point in the Ubuntu deployment instructions you would use a2ensite to enable your site. On CentOS Apache automatically loads all /etc/httpd/conf.d/*.conf files when it starts up, so there's nothing to do. Just restart Apache:

 $ sudo service httpd restart

You should now be able to visit 127.0.0.1 and see your CKAN website being served by the Apache web server.

Troubleshooting

If it's not working, try disabling iptables, disabling selinux, starting tomcat, starting postgres, and restarting Apache:

 $ sudo service iptables stop
 $ sudo su  
 (root) $ sudo echo 0 > /selinux/enforce
 $ <ctrl-d>
 $ sudo /etc/init.d/tomcat6 start
 $ sudo service postgresql start
 $ sudo service httpd restart

You'll have to do this after rebooting the machine, for example.

TODO: Make everything start up automatically on boot.

⚠️ **GitHub.com Fallback** ⚠️