How to Install CKAN 2.9 on Amazon Linux 2 - ckan/ckan GitHub Wiki

Install Dependencies First

1. Install Redis (installed on separate node: Ubuntu 20)

$ sudo apt-get install redis-server    [for AL2: sudo amazon-linux-extras install redis4.0]
$ sudo vim /etc/redis/redis.conf

`- Comment **#bind 127.0.0.1 ::1**, so it can listen on all interfaces`
`- Diable protected mode "**protected-mode no**"`

$ sudo /etc/init.d/redis-server restart

- Test from CKAN instance

$ telnet <RediServer-IP> 6379

2. Install SOLR

 - Install & Configure Solr  [On the same Ubuntu 20 node]

$ sudo apt install -y solr-tomcat

 - Change the default port Tomcat runs on (8080) to the one expected by CKAN. To do so change the following line in the /etc/tomcat9/server.xml file (tomcat8 in older Ubuntu versions):

From:
<Connector port="8080" protocol="HTTP/1.1"

To:
<Connector port="8983" protocol="HTTP/1.1"
  • Replace the default schema.xml file with a symlink to the CKAN schema file included in the sources. [ On Solr Machine]

    $ sudo mv /etc/solr/conf/schema.xml /etc/solr/conf/schema.xml.bak

  • Copy schema.xml file from CKAN machine to this Solr Machine. This can only be done once CKAN is installed (later steps)

  • Copy /usr/lib/ckan/default/src/ckan/ckan/config/solr/schema.xml [From CKAN-Machine]

      to /etc/solr/conf/schema.xml  [on Solr Machine]
    
  • Restart Solr

    sudo service tomcat9 restart

  • Check Solr running http://IP-Address:8983/solr/

3. Install POSTGRE SQL

  • Install and configure PostgreSQL (On the same Ubuntu 20.04 OS)

    $ sudo apt update $ sudo apt install -y postgresql net-tools $ sudo service postgresql start

    • verify if postgres working $ sudo -u postgres psql -l Check that the encoding of databases is UTF8

    • Create a new PostgreSQL user called ckan_default, and enter a password as an Ubuntu user:

      $ sudo -u postgres createuser -S -D -R -P ckan_default

    • Create a new PostgreSQL CKAN database, called ckan_default, owned by the database user you just created: as an Ubuntu user:

      $ sudo -u postgres createdb -O ckan_default ckan_default -E utf-8

    • Edit postgresql.conf and pg_hba.conf. On Ubuntu, these files are located in /etc/postgresql/{Postgres version}/main.

      • cd /etc/postgresql/12/main/
      • vim postgresql.conf listen_addresses = '*'
    • Add a line similar to the line below to the bottom of pg_hba.conf to allow the machine running the web server to connect to PostgreSQL. Please change the IP address as desired according to your network settings.

      • vim pg_hba.conf

          host    all             all             <CKAN-IP>/32            md5		   
        
    • Verify if postgreSQL is now listening on all the interfaces

      $ sudo service postgresql restart $ netstat -tulpn |grep 5432

    • Set-up Datastore database & user

    • Create a database_user called datastore_default.

        Using 'Ubuntu user'
      
        $ sudo -u postgres createuser -S -D -R -P -l ckan_default  ## if not already created above during CKAN settings
      

      $ sudo -u postgres createuser -S -D -R -P -l datastore_default

    • Create the database (owned by ckan_default), which we’ll call datastore_default:

        $ sudo -u postgres createdb -O ckan_default datastore_default -E utf-8
      
        $ sudo -u postgres psql -l
      
    • Set permissions [Important]

    ############ This Step will be performed once CKAN is setup with datastore and database is initialized from CKAN machine ############

    Once the DataStore database and the users are created, the permissions on the DataStore and CKAN database have to be set. CKAN provides a ckan command to help you correctly set these permissions. Since both CKAN and PostgreSQL are running on separate machines, we need to generate permissions on CKAN instance and deploy it on PostgreSQL:

    • Generate the datastore permissions on CKAN machine (ckan -c /etc/ckan/default/ckan.ini datastore set-permissions) and paste it here:

    $ sudo -u postgres psql

    postgres=#

4. Install & Configure CKAN (on Amazon Linux 2)

  • Installing the required packages

    • Install dependencies

      $ sudo yum install python37 postgresql-devel python3-devel -y $ sudo yum install wget policycoreutils-python python3-pip git-core java-1.8.0-openjdk maven lsof gcc gcc gcc-c++ cmake automake gmp-devel boost -y

    • Create Virtual environment

      mkdir -p ~/ckan/lib sudo ln -s ~/ckan/lib /usr/lib/ckan mkdir -p ~/ckan/etc sudo ln -s ~/ckan/etc /etc/ckan

      sudo mkdir -p /usr/lib/ckan/default sudo chown whoami /usr/lib/ckan/default python3 -m venv /usr/lib/ckan/default . /usr/lib/ckan/default/bin/activate

      ####### EVERYTHING FROM NOW ON will be executed on virtual env #######

    • Create folder for FileStore and file uploads.

      • create using ec2-user, so ec2-user have access to this folder.

        (default) mkdir -p /var/lib/ckan/default

    • Generate & Configure CKAN.ini

      (default) sudo mkdir -p /etc/ckan/default (default) sudo chown -R whoami /etc/ckan/

      (default) ckan generate config /etc/ckan/default/ckan.ini

      ################ ENSURE YOU HAVE SETUP REDIS, SOLR & PostgreSQL before you configure them here and initialise the DB & set Datastore permissions. ################

      CKAN.ini

        ## Database Settings
        sqlalchemy.url = postgresql://ckan_default:<password>@<POSTGRES-IP>/ckan_default
        ckan.datastore.write_url = postgresql://ckan_default:<password>@<POSTGRES-IP>/datastore_default
        ckan.datastore.read_url = postgresql://datastore_default:<password>@<POSTGRES-IP>/datastore_default
      
        ## Site Settings
        ckan.site_url = http://ec2-1x-2xx-2xx-1xx.ap-southeast-2.compute.amazonaws.com
      
        ## Search Settings
        ckan.site_id = default-me 		# has to be unique
        solr_url = 	http://<SOLR-IP>:8983/solr
      
        ## Redis Settings
        # URL to your Redis instance, including the database to be used.
        ckan.redis.url = redis://<REDIS-IP>:6379/0
      
        # Enable FileStore and file uploads.
        # When enabled, CKAN’s FileStore allows users to upload data files to CKAN resources, and to upload logo images for groups and organizations. Users will see an upload button when creating or updating a resource, group or organization.
        ckan.storage_path = /var/lib/ckan/default
      
        # Enable the datastore plugin
        # Add/Append the datastore plugin
      
        ckan.plugins = stats text_view image_view recline_view **datastore**
      
    • Link to who.ini

      (default) ln -s /usr/lib/ckan/default/src/ckan/who.ini /etc/ckan/default/who.ini

    • DB Initialization and Create tables:

    (default) ckan -c /etc/ckan/default/ckan.ini db init

    2020-12-15 23:24:30,051 INFO [ckan.cli] Using configuration file /etc/ckan/default/ckan.ini 2020-12-15 23:24:30,051 INFO [ckan.config.environment] Loading static files from public 2020-12-15 23:24:30,099 INFO [ckan.config.environment] Loading templates from /home/ec2-user/ckan/lib/default/src/ckan/ckan/templates 2020-12-15 23:24:30,398 INFO [ckan.config.environment] Loading templates from /home/ec2-user/ckan/lib/default/src/ckan/ckan/templates 2020-12-15 23:24:30,695 INFO [ckan.cli.db] Initialize the Database 2020-12-15 23:24:30,844 INFO [ckan.model] CKAN database version remains as: ccd38ad5fced (head) 2020-12-15 23:24:30,844 INFO [ckan.model] Database initialised Initialising DB: SUCCESS

    • Test Run your ckan

    (default) cd /usr/lib/ckan/default/src/ckan (default) ckan -c /etc/ckan/default/ckan.ini run

    2020-12-16 01:08:11,782 INFO [ckan.cli] Using configuration file /etc/ckan/default/ckan.ini 2020-12-16 01:08:11,783 INFO [ckan.config.environment] Loading static files from public 2020-12-16 01:08:11,825 INFO [ckan.config.environment] Loading templates from /home/ec2-user/ckan/lib/default/src/ckan/ckan/templates 2020-12-16 01:08:12,155 INFO [ckan.config.environment] Loading templates from /home/ec2-user/ckan/lib/default/src/ckan/ckan/templates 2020-12-16 01:08:12,505 INFO [ckan.cli.server] Running server localhost on port 5000 2020-12-16 01:08:13,510 INFO [ckan.cli] Using configuration file /etc/ckan/default/ckan.ini .... ....

Access it on http://localhost:5000

  • Set Datastore permissions

    on CKAN Machine, run following:

    (It ensures that the datastore read-only user will only be able to select from the datastore database but has no create/write/edit permission or any permissions on other databases. You must execute this script as a database superuser on the PostgreSQL server that hosts your datastore database.)

     (default) ckan -c /etc/ckan/default/ckan.ini datastore set-permissions
    
      */
    
      -- Most of the following commands apply to an explicit database or to the whole
      -- 'public' schema, and could be executed anywhere. But ALTER DEFAULT
      -- PERMISSIONS applies to the current database, and so we must be connected to
      -- the datastore DB:
      \connect "datastore_default"
    
      -- revoke permissions for the read-only user
      REVOKE CREATE ON SCHEMA public FROM PUBLIC;
      REVOKE USAGE ON SCHEMA public FROM PUBLIC;
    
      GRANT CREATE ON SCHEMA public TO "ckan_default";
      GRANT USAGE ON SCHEMA public TO "ckan_default";
    
      GRANT CREATE ON SCHEMA public TO "ckan_default";
      GRANT USAGE ON SCHEMA public TO "ckan_default";
    
      -- take connect permissions from main db
      REVOKE CONNECT ON DATABASE "ckan_default" FROM "datastore_default";
    
      -- grant select permissions for read-only user
      GRANT CONNECT ON DATABASE "datastore_default" TO "datastore_default";
      GRANT USAGE ON SCHEMA public TO "datastore_default";
    
      -- grant access to current tables and views to read-only user
      GRANT SELECT ON ALL TABLES IN SCHEMA public TO "datastore_default";
    
      -- grant access to new tables and views by default
      ALTER DEFAULT PRIVILEGES FOR USER "ckan_default" IN SCHEMA public
         GRANT SELECT ON TABLES TO "datastore_default";
    
      -- a view for listing valid table (resource id) and view names
      CREATE OR REPLACE VIEW "_table_metadata" AS
          SELECT DISTINCT
              substr(md5(dependee.relname || COALESCE(dependent.relname, '')), 0, 17) AS "_id",
              dependee.relname AS name,
              dependee.oid AS oid,
              dependent.relname AS alias_of
          FROM
              pg_class AS dependee
              LEFT OUTER JOIN pg_rewrite AS r ON r.ev_class = dependee.oid
              LEFT OUTER JOIN pg_depend AS d ON d.objid = r.oid
              LEFT OUTER JOIN pg_class AS dependent ON d.refobjid = dependent.oid
          WHERE
              (dependee.oid != dependent.oid OR dependent.oid IS NULL) AND
              -- is a table (from pg_tables view definition)
              -- or is a view (from pg_views view definition)
              (dependee.relkind = 'r'::"char" OR dependee.relkind = 'v'::"char")
              AND dependee.relnamespace = (
                  SELECT oid FROM pg_namespace WHERE nspname='public')
          ORDER BY dependee.oid DESC;
      ALTER VIEW "_table_metadata" OWNER TO "ckan_default";
      GRANT SELECT ON "_table_metadata" TO "datastore_default";
    
      -- _full_text fields are now updated by a trigger when set to NULL
      CREATE OR REPLACE FUNCTION populate_full_text_trigger() RETURNS trigger
      AS $body$
          BEGIN
              IF NEW._full_text IS NOT NULL THEN
                  RETURN NEW;
              END IF;
              NEW._full_text := (
                  SELECT to_tsvector(string_agg(value, ' '))
                  FROM json_each_text(row_to_json(NEW.*))
                  WHERE key NOT LIKE '\_%');
              RETURN NEW;
          END;
      $body$ LANGUAGE plpgsql;
      ALTER FUNCTION populate_full_text_trigger() OWNER TO "ckan_default";
    
      -- migrate existing tables that don't have full text trigger applied
      DO $body$
          BEGIN
              EXECUTE coalesce(
                  (SELECT string_agg(
                      'CREATE TRIGGER zfulltext BEFORE INSERT OR UPDATE ON ' ||
                      quote_ident(relname) || ' FOR EACH ROW EXECUTE PROCEDURE ' ||
                      'populate_full_text_trigger();', ' ')
                  FROM pg_class
                  LEFT OUTER JOIN pg_trigger AS t
                      ON t.tgrelid = relname::regclass AND t.tgname = 'zfulltext'
                  WHERE relkind = 'r'::"char" AND t.tgname IS NULL
                      AND relnamespace = (
                          SELECT oid FROM pg_namespace WHERE nspname='public')),
                  'SELECT 1;');
          END;
      $body$;
    

    Now On Postgres Machine

    Copy and paste above commands

5. Public Exposure

Once you’ve installed CKAN from source, you need to deploy your CKAN site using a rudimentary web server. Because CKAN uses WSGI, a standard interface between web servers and Python web applications, CKAN can be used with a number of different web server and deployment configurations, however the CKAN project has now standardized on one NGINX with uwsgi

  1. Install Nginx

    Install NGINX (a web server) which will proxy the content from one of the WSGI Servers and add a layer of caching:

    $ sudo amazon-linux-extras install nginx1 -y

  2. Create the WSGI script file

    $ sudo cp /usr/lib/ckan/default/src/ckan/wsgi.py /etc/ckan/default/

  3. Create the WSGI Server (using python virtual env)

    (default) pip install uwsgi (default) sudo cp /usr/lib/ckan/default/src/ckan/ckan-uwsgi.ini /etc/ckan/default/

  4. We will not use Supervisor but will run ckan as Systemd service:

    (default) vim /etc/ckan/default/ckan-uwsgi.ini

     [uwsgi]
    
     http            =  127.0.0.1:8080
     uid             =  ec2-user
     guid            =  ec2-user
     wsgi-file       =  /etc/ckan/default/wsgi.py
     virtualenv      =  /usr/lib/ckan/default
     module          =  wsgi:application
     master          =  true
     pidfile         =  /tmp/%n.pid
     harakiri        =  50
     max-requests    =  5000
     vacuum          =  true
     callable        =  application
     buffer-size     =  32768
    

    (default) vi /etc/systemd/system/ckan.service

     [Unit]
     Description=CKAN 2.9
     After=syslog.target
    
     [Service]
     ExecStart=/usr/lib/ckan/default/bin/uwsgi --ini /etc/ckan/default/ckan-uwsgi.ini
     # Requires systemd version 211 or newer
     RuntimeDirectory=ckan
     Restart=always
     KillSignal=SIGQUIT
     Type=notify
     StandardError=syslog
     NotifyAccess=all
    
     [Install]
     WantedBy=multi-user.target	
    

    sudo systemctl enable ckan sudo systemctl start ckan sudo systemctl status ckan -l

  5. Create the NGINX config file

    (default) vim /etc/nginx/conf.d/ckan.conf

     proxy_cache_path /tmp/nginx_cache levels=1:2 keys_zone=cache:30m max_size=250m;
     proxy_temp_path /tmp/nginx_proxy 1 2;
    
     server {
         client_max_body_size 100M;
         location / {
             proxy_pass http://127.0.0.1:8080/;
             proxy_set_header X-Forwarded-For $remote_addr;
             proxy_set_header Host $host;
             proxy_cache cache;
             proxy_cache_bypass $cookie_auth_tkt;
             proxy_no_cache $cookie_auth_tkt;
             proxy_cache_valid 30m;
             proxy_cache_key $host$scheme$proxy_host$request_uri;
             # In emergency comment out line to force caching
             # proxy_ignore_headers X-Accel-Expires Expires Cache-Control;
         }
    
     }
    

    sudo systemctl restart nginx

⚠️ **GitHub.com Fallback** ⚠️