NewInstallation - ufal/clarin-dspace GitHub Wiki

There are two ways to install DSpace. First is in Docker, which is easier and preferred. The second consists of downloading all necessary software, matching versions, configuring it, compiling, installing and running.

RUNNING DSPACE IN DOCKER

  • Install Docker Desktop
  • Docker compose v2 is required. On Linux, it does not always come by default, so if necessary, install it with official guide.

All necessary files are in frontend repository, so first, checkout the repository from github.

git clone https://github.com/ufal/dspace-angular
cd dspace-angular
  • All necessary files are in the frontend repository, so first, checkout the repository from GitHub.

  • In order to run DSpace in Docker, .env file in the front-end root folder (dspace-angular/) with environment variables is necessary. There are two basic scenarios that require slightly different configurations - the example of .env file for each scenario is specified below. (Localhost/Public)

  • After setting up .env file, run Docker and create users.

  • After Docker containers are started, don't forget to set up Nginx as specified below, in order to be able to access DSpace from remote hosts.


a) Localhost .env set-up

When running on localhost, the frontend MUST run in development mode. The .env file example is here:

INSTANCE=0
DSPACE_HOST=localhost
DSPACE_VER=dspace-7_x
DSPACE_SSL=false

FE_CMD=yarn start:dev

#please do not edit the following variables unless you know what you are doing
DOCKER_OWNER=ufal
DSPACE_UI_IMAGE=${DOCKER_OWNER}/dspace-angular:$DSPACE_VER
DSPACE_REST_IMAGE=${DOCKER_OWNER}/dspace:$DSPACE_VER

DSPACE_REST_PORT=808${INSTANCE}
UI_PORT=400${INSTANCE}

DSPACE_REST_NAMESPACE=/server
DSPACE_UI_NAMESPACE=/

REST_URL=http://${DSPACE_HOST}:${DSPACE_REST_PORT}${DSPACE_REST_NAMESPACE}
UI_URL=http://${DSPACE_HOST}:${UI_PORT}${DSPACE_UI_NAMESPACE}

b) Public .env set-up

Example of .env in frontend:

INSTANCE=0
DSPACE_HOST=example.com
DSPACE_VER=dspace-7_x
DSPACE_SSL=true

# If you want to run the front-end in development mode, uncomment the next line
# FE_CMD=yarn start:dev
# NOTE!: The line above is NECESSARY for localhost.

#please do not edit the following variables unless you know what you are doing
DOCKER_OWNER=ufal
DSPACE_UI_IMAGE=${DOCKER_OWNER}/dspace-angular:$DSPACE_VER
DSPACE_REST_IMAGE=${DOCKER_OWNER}/dspace:$DSPACE_VER

DSPACE_REST_PORT=8${INSTANCE}
UI_PORT=8${INSTANCE}

DSPACE_REST_NAMESPACE=/server
DSPACE_UI_NAMESPACE=/

REST_URL=http://${DSPACE_HOST}:${DSPACE_REST_PORT}${DSPACE_REST_NAMESPACE}
UI_URL=http://${DSPACE_HOST}:${UI_PORT}${DSPACE_UI_NAMESPACE}

# If you want to set up JAVA_OPTS
# Server memory limit (4GB)
# JAVA_OPTS=-Xmx4g

You may need to change DSPACE_REST_PORT to something else, e.g.443. Feel free to leave out the $INSTANCE part and just use the port number.

In both versions, it is possible to modify the first section of values. An instance is an arbitrary number, but enables several DSpace instances to run on the same machine. Be sure to use different project names (-p parameter for Docker Compose)! Also, be sure to check if your machine has sufficient resources (CPU, RAM) for that.

DSPACE_VER refers to image tag, most are in this list: Docker Tags

If your reverse proxy is on a different machine add HOST_IP=a.b.c.d to your .env where a.b.c.d is the IP on the interface that you reverse proxy can reach


Run Docker

After setting up .env file, run the commands for starting Docker (you can replace dspace-project-name with something suitable for you):

docker compose --env-file .env -f docker/docker-compose.yml -f docker/docker-compose-rest.yml pull
docker compose --env-file .env -p dspace-project-name -f docker/docker-compose.yml -f docker/docker-compose-rest.yml up -d --no-build

Now you should be able to open $UI_URL (http://localhost:4000/ if you haven't changed it) in you browser. It takes a while before everything starts.

To add administrator and other users, use the following commands, docker compose files and .env exactly the same as above.

docker compose --env-file .env -p dspace-project-name -f docker/docker-compose.yml -f docker/docker-compose-rest.yml -f docker/cli.yml run --rm dspace-cli create-administrator -e [email protected] -f firstname -l lastname -p password -c en -o organization
docker compose --env-file .env -p dspace-project-name -f docker/docker-compose.yml -f docker/docker-compose-rest.yml -f docker/cli.yml run --rm dspace-cli user --add -m [email protected] -g givenname -s surname -l en -p password -o organization

Obviously, it is possible to change parameters like -e for email, -m for email, -f for first name, -g for given name, -s for surname, -l for last name, -p for password, -o for organization. Only use the arguments for the command as specified above. Just modify values if needed.

In the folder with Docker compose files (docker file in the above) it is also possible to have a config.prod.yml file for the front-end and a local.cfg file for the back-end.


Defining a custom namespace

https://github.com/dataquest-dev/DSpace/wiki/Custom-namespace


Avoiding deleting volumes

The main rule is just to be careful. When volume is mounted on another disk, Docker doesn't allow the removal of the volume. Instead, error is displayed: Error response from daemon: remove <volume-name>: Unable to remove a directory outside of the local volume root /var/lib/docker: /<path-to-docker-storage>/volumes/test/_data. It is possible to use this fact to add another layer of protection of volumes by placing them on another disk (which is sometimes necessary in any case, due to data size). It can be done simply by sym link /var/lib/docker/volumes to a specified place on another disk. But be sure to test it before relying on it.


RUNNING DSPACE WITHOUT DOCKER

There are original installation instructions from vanilla DSpace. However, they are quite long and extensive and some parts are not necessary. They also list several possible versions, so here is a shortened list. Consult the original instructions if anything is unclear.


Required software

Make sure you know and are able to access the installed/extracted software.


Installation

  • create a database

    • go to the database installation folder
    • createuser --username=postgres --no-superuser --pwprompt dspace
    • createdb --username=postgres --owner=dspace --encoding=UNICODE dspace
    • psql --username=postgres -c "CREATE EXTENSION pgcrypto;" dspace
  • download DSpace sources (this repo)

  • edit configuration in dspace/config/clarin-dspace.cfg (and other configs)

  • use the command mvn clean install in the repo root

  • (go to /dspace/target/dspace-installer)

  • use command ant fresh_install in <dspace-repo>/dspace/target/dspace-installer

    • above command creates dspace installation in a new folder. By default, it is C:/dspace or /dspace.
    • locate it and make sure this command created it.
    • from now on, we will refer to it as <dspace-installation-folder>
  • (go to DSpace installation folder )

  • use the command bin/dspace database migrate force in <dspace-installation-folder>

  • create admin bin/dspace create-administrator in <dspace-installation-folder>

  • copy everything from webapps/* to <tomcat>\webapps

  • copy solr cores cp -R [dspace]/solr/* [solr]/server/solr/configsets

  • download frontend sources

  • use command yarn install


Running DSpace

  • make sure your database is running (it should be automatically)
  • (go to frontend sources)
  • use the command yarn start in <frontend-source>
  • start solr by solr start
  • start tomcat by using catalina run

Notes

The .env file can contain the following additional variables to configure S3

S3_STORAGE=1
S3_ENABLED=true

S3_RELATIVE_PATH=false
S3_BUCKET=docker-dummy-bucket
S3_SUBFOLDER=
S3_ACCESS=myaccestoken
S3_SECRET=mysecretpasswordtoken
S3_REGION_NAME=us-east-1

This should be valid since version 7.5. The first two must remain as is, in order to enable S3. The rest can (should) be modified.


Nginx

The whole server block should look like this:

server {
        listen 80;
        server_name dspace.url;
        location / {
            proxy_pass http://localhost:4000;
        }
        location /server/ {
            proxy_set_header Host $http_host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

            proxy_pass http://localhost:8080;
        }
}

This assumes the following:

  • DSpace is run in Docker
  • The back-end runs on port 8080
  • The front-end runs on port 4000
  • settings in .env or config state following addresses:
    • DSPACE_UI_URL: dspace.url
    • DSPACE_REST_URL: dspace.url/server/

Of course, if some ports are different, change them in configuration.

TODO: document necessary headers (such as X-Forwarded-Proto and X-Forwarded-Port) and ref https://github.com/dataquest-dev/DSpace/issues/536


CMDI data for machines

Returning just the cmdi metadata must be ensured in Clarin installations. Add this to the location / block from above.

# placed in location block of DSpace frontend

# redirect .../handle/123456/123456?format=cmdi to .../cmdi/oai-metadata... which returns just XML file with metadata
# ? at the end of the redirect stops nginx from appending original parameters
if ($query_string ~* "format=cmdi"){
    rewrite ^/(.*)handle/(.*)$ http://$http_host/server/cmdi/oai-metadata?metadataPrefix=cmdi&handle=$2? redirect;
}

# if HTTP request to .../handle/123456/123456 contains header "Accept: application/x-cmdi+xml" or similar, redirect
# to the same as above. 
# http_*name*of*header* returns any header, in this case Accept:
if ($http_accept ~ "(.*cmdi.xml*)"){
    rewrite ^/(.*)handle/(.*)$ http://$http_host/server/cmdi/oai-metadata?metadataPrefix=cmdi&handle=$2? redirect;
}

assuming the FE and BE are behind the same host (proxy); you can:

    # CMDI content - # replace repository-ng with your path prefix, or tweak the regexp as above
    if ($arg_format ~* "cmdi"){
        rewrite ^/repository-ng/handle/(.*)$ /repository-ng/server/cmdi/oai-metadata?metadataPrefix=cmdi&handle=$1? last;
    }

    if ($http_accept = "application/x-cmdi+xml"){
        rewrite ^/repository-ng/handle/(.*)$ /repository-ng/server/cmdi/oai-metadata?metadataPrefix=cmdi&handle=$1? last;
    }
    # /CMDI content

Check

To check the first part, use a command like

curl -k https://dspacehost.com/handle/1234/56789?format=cmdi -L

To check the second part, use a command like

curl -k https://dspacehost.com/handle/1234/56789 -L -H "Accept: application/x-cmdi+xml"


TODO shibboleth configuration

start from https://github.com/ufal/clarin-dspace/issues/1032#issuecomment-2066469795


CRON jobs

run-cli-command.sh = sudo docker exec -w /dspace/bin dspace8 ./dspace "$@" chmod +x /path/to/run-cli-command-88.sh

0 23 * * * cd /app && ./run-cli-command-88.sh oai import

20 0 * * * cd /app && ./run-cli-command-88.sh index-discovery

1 3 * * * cd /app && ./run-cli-command-88.sh subscription-send -f D

2 3 * * 0 cd /app && ./run-cli-command-88.sh subscription-send -f W

3 3 1 * * cd /app && ./run-cli-command-88.sh subscription-send -f M

0 4 1 * * cd /app && ./run-cli-command-88.sh cleanup

30 0 * * * cd /app && ./run-cli-command-88.sh health-report -e <YOUR_EMAIL>

or

cat /etc/cron.d/lindatrepo
MAILTO=root
RUNCMD="docker compose -p lindatrepo exec dspace /dspace/bin/dspace"

0 23 * * * root $RUNCMD oai import

20 0 * * * root $RUNCMD index-discovery

1 3 * * * root $RUNCMD subscription-send -f D

2 3 * * 0 root $RUNCMD subscription-send -f W

3 3 1 * * root $RUNCMD subscription-send -f M

0 4 1 * * root $RUNCMD cleanup -v

30 0 * * * root $RUNCMD health-report -e <YOUR_EMAIL>

NOTE

Avoid running any /dspace/bin/dspace commands around midnight. That's when log rotation happens and we've seen log lost (probably due to multiple log rotations)


NOTE

OAI

There exists a fear that OAI might be unstable. Please check if OAI shows items after adding (or harvesting) many of them. They should be visible in the OAI interface. If something is wrong, either an empty site or an Error number and a short description will be shown. Check logs of apache-tomcat, try folders /dspace/log and tomcat/logs (in docker it's usually /usr/local/tomcat/logs).

⚠️ **GitHub.com Fallback** ⚠️