Service: ERDDAP Subsetting Service - SeaDataCloud/Documentation GitHub Wiki

How to deploy IFREMER's ERDDAP subsetting service

Basic info

For every user, a ERDDAP container is started by a JupyterHub instance. JupyterHub does the user authentication, starts the container, and redirects the user to the container. The user's NextCloud data is also bind-mounted by JupyterHub.

The service is running on vre3.argo.grnet.gr as plain http. We are using the nginx proxy on port 443 for the SSL termination, so ERDDAP is accessible on https://vre3.argo.grnet.gr/erddap. To login there, you need a POST request sending vre_username, vre_displayname, service_auth_token and vre_URL.

For testing, there is a HTML login form available at https://vre3.argo.grnet.gr/erddap/locallogin.

Useful info and mounts

What will be mounted into the JupyterHub (defined in docker-compose.yml):

  • NFS-Mounted NextCloud user directories. Mounted from HOST_WHERE_ARE_USERDIRS (GRNET: /mnt/sdc-nfs-data/) to /usr/share/userdirectories/.
  • JupyterHub config file. Mounted from /root/erddap/jupyterhub_config.py to /srv/jupyterhub/jupyterhub_config.py.
  • Docker socket, needed to spawn containers. Mounted from /var/run/docker.sock to the same location inside the container.

What will be mounted into the spawned containers (defined in jupyterhub_config.py, setting c.DockerSpawner.volumes, via the dict volume_mounts):

  • The user's directory. Mounted from HOST_WHERE_ARE_USERDIRS (GRNET: /mnt/sdc-nfs-data/<username>/files) to /nextcloud, where ERDDAP expects it.

We used to have to mount the webapps and the service data, but since 20201023 they are included in the image.

Some info:

  • JupyterHub image: registry-sdc.argo.grnet.gr/jupyterhub_vre:20201020 (status 20201020)
  • ERDDAP image: registry-sdc.argo.grnet.gr/ifr-sdn-subset-service:20200812 (status 20201020)
  • Service runs as uid 1000, called isi_exp in the container to be able to read/write the synced user data.
    • This is defined inside the service, and can only be changed in the service image. By setting RUN_AS_USER=1000 we make sure that the JupyterHub also chowns files to 1000 when creating user directories. But as we mount existing user directories, this should not really have an effect. However, the mounted data must be readable/writeable by uid 1000. But it is owned by uid 33?! Readable/writeable by group gid 1000, but the service seems to run as group gid 10004 - ??? TODO CHECK
    • Data inside the service image is owned by gid 10004 (ditiisi), but I am not sure why.

Before deploying ERDDAP

The user data (NextCloud data) should already be in place before starting the deployment. Currently, we run ERDDAP on a server at GRNET, located close to the other VRE servers at GRNET. That's why we are mounting the data via NFS.

If you run DIVA on a VM with synchronized (rather than NFS-mounted) NextCloud data, you mainly have to adapt USERDIR_TEMPLATE_HOST in the .env file (further below).

Operating on NFS-mounted data

Make sure you have the NFS-mounted user data ready. On GRNET's VMs it sits in /mnt/sdc-nfs-data/, but you can use any other location - just make sure you specify it in HOST_WHERE_ARE_USERDIRS in the .env file.

The value of USERDIR_TEMPLATE_HOST should be /{raw_username}/files, so that the subdirectories called files inside directories named <username> are mounted.

This is how it should look:

[root@snf-7990 ~]# ls -lpah /mnt/sdc-nfs-data/
total 6.8M
drwxrwx---  27   33 1000 4.0K Oct 20 15:18 ./
drwxr-xr-x.  7 root root 4.0K Sep 14 09:19 ../
drwxrwxr-x   4   33 1000 4.0K Oct 20 05:08 admin/
drwxrwxr-x  10   33 1000 4.0K Sep 15 10:35 appdata_ocn8npls9fx3/
drwxrwxr-x   6   33 1000 4.0K Sep 15 09:15 appdata_ocz2pnx2ewe6/
drwxrwxr-x   2   33 1000 4.0K Sep 15 09:15 files_external/
-rw-rw-r--   1   33 1000  324 Sep 15 09:15 .htaccess
-rw-rw-r--   1   33 1000    0 Sep 15 09:15 index.html
-rw-r-----   1   33 1000  44K Oct 20 14:53 nextcloud.log
-rw-rw-r--   1   33 1000    0 Sep 14 12:46 .ocdata
-rw-rw-r--   1   33 1000 6.7M Oct 20 15:18 owncloud.db
drwxrwxr-x   4   33 1000 4.0K Sep 17 15:07 vre_tomandjerrymarineidorgcgfqwt7j/ # one user
drwxrwxr-x   4   33 1000 4.0K Sep 25 05:54 vre_bugsbunnymarineidorgy4773w8y/   # another user
(...)

When peeking into a user directory, you should see /files, and only in there, the actual contents:

[root@snf-7990 ~]# ls -lpah /mnt/sdc-nfs-data/vre_tomandjerrymarineidorgcgfqwt7j/
total 20K
drwxrwxr-x  5 33 33 4.0K Sep 25 11:23 ./
drwxrwx--- 36 33 33 4.0K Oct 23 13:15 ../
drwxrwxr-x 13 33 33 4.0K Oct 21 17:56 files/
...

[root@snf-7990 ~]# ls -lpah /mnt/sdc-nfs-data/vre_tomandjerrymarineidorgcgfqwt7j/files
total 52K
drwxrwxr-x 13 33 33 4.0K Oct 21 17:56 ./
drwxrwxr-x  5 33 33 4.0K Sep 25 11:23 ../
drwxrwxr-x  2 33 33 4.0K Sep 21 09:59 BioQC_test_data/
drwxrwxr-x  6 33 33 4.0K Sep 21 10:08 ERDDAP_test_data/
drwxrwxr-x  2 33 33 4.0K Sep 21 09:59 Imports/
drwxrwxr-x  2 33 33 4.0K Sep 21 09:59 Results/
drwxrwxr-x  5 33 33 4.0K Sep 21 10:00 webODV_test_data/
drwxrwxr-x  2 33 33 4.0K Sep 28 08:12 Work/
...

Operating on synchronized data

In case you operate on sychnronized data, make sure you have the sync in place. On bluewhale the synchronized data sits in /scratch/vre/sync_from_athens/nextcloud_data/, but you can use any other location - just make sure you specify it in HOST_WHERE_ARE_USERDIRS in the .env file.

The value of USERDIR_TEMPLATE_HOST should be /{raw_username}, so that the directories named <username> are mounted (without a /files subdirectory).

This is how it should look:

[alice@bluewhale ~]$ ls -lpah /scratch/vre/sync_from_athens/nextcloud_data
total 4.0K
drwxr-xr-x. 15 vre  vre  4.0K Oct 20 16:26 ./
drwxr-xr-x.  5 root root   92 Sep 14 10:39 ../
drwxrwxr-x. 13 vre  vre   230 Oct 21 01:48 vre_tomandjerrymarineidorgcgfqwt7j/ # one user
drwxr-xr-x.  2 vre  vre     6 Oct  1 10:04 vre_bugsbunnymarineidorgy4773w8y/   # another user
...

Inside the user directories, their content should be directly visible:

[alice@bluewhale ~]$ ls -lpah /scratch/vre/sync_from_athens/nextcloud_data/vre_tomandjerrymarineidorgcgfqwt7j/
total 8.0K
drwxrwxr-x. 13 vre vre  230 Oct 21 01:48 ./
drwxr-xr-x. 15 vre vre 4.0K Oct 20 16:26 ../
drwxrwxr-x.  3 vre vre   86 Oct 20 03:12 BioQC_test_data/
drwxrwxr-x.  6 vre vre   68 Sep 25 13:29 ERDDAP_test_data/
drwxrwxr-x.  2 vre vre    6 Sep 25 13:29 Imports/
drwxrwxr-x.  2 vre vre    6 Sep 25 13:29 Results/
drwxrwxr-x.  5 vre vre  112 Sep 25 13:29 webODV_test_data/
drwxrwxr-x.  2 vre vre   81 Sep 28 10:13 Work/
...

Deployment step-by-step

  • Create home dir called erddap (wherever you have your service directories, e.g. /root/erddap)
mkdir /root/erddap
  • Download docker-compose.yml, .env, config
cd /root/erddap
wget https://raw.githubusercontent.com/SeaDataCloud/vre-config/master/services/erddap/docker-compose.yml
wget https://raw.githubusercontent.com/SeaDataCloud/vre-config/master/services/erddap/jupyterhub_config.py
wget https://raw.githubusercontent.com/SeaDataCloud/vre-config/master/services/erddap/env-erddap
mv env-erddap.env

Changes to docker-compose.yml:

  • No changes. (But make sure the URL used for authentication is included in the WHITELIST_AUTH env value!).

Changes to jupyterhub_config.py:

  • No changes. (But there is a section with ERDDAP-specific config, which should be included via if True: by default, but to be sure you can check again, just in case this config file was copied from some other service where it was set to if False:...)

Changes to .env:

  • Change the value of ADMIN_PW to some value of your choice (replace foo).
  • Change the value of JUPYTERHUB_CRYPT_KEY to the result of running openssl rand -hex 32 (replace foo).
  • Change the value of HOST_NAME to the FQDM of the machine where erddap will be reachable (replace jellyfish.argo.grnet.gr).
  • Other values may have to change in case you use different paths than in this guide, e.g.:
    • HOST_WHERE_ARE_USERDIRS in case you don't have the NextCloud data in /mnt/sdc-nfs-data/
    • ... (just look :)) ...
vi .env                 # do changes!
openssl rand -hex 32    # result goes into .env, for CRYPT_KEY!
  • Pull the ERDDAP image
docker pull registry-sdc.argo.grnet.gr/ifr-sdn-subset-service:20201022
  • Start the service
docker-compose down && docker-compose up -d && docker-compose logs --tail=100 -f
cd /root/revproxy
vi proxy.conf
# add locations for this service to config
docker-compose down && docker-compose up -d && docker-compose logs --tail=100 -f

Reverse proxy config

Add this config to the reverse proxy on your machine:

Upstreams section:

upstream jhub_web {
    server erddap_hub_erddap_1:8000;
}

Somewhere above:

# From:
# https://jupyterhub.readthedocs.io/en/stable/reference/config-proxy.html
# top-level http config for websocket headers
# If Upgrade is defined, Connection = upgrade
# If Upgrade is empty, Connection = close
map $http_upgrade $connection_upgrade {
    default upgrade;
    ''      close;
}

Locations section:

    location = /erddap/hub/login {
        if ($request_method = GET ) {
           return 302 https://vre.seadatanet.org;
        }
        proxy_pass http://jhub_web/erddap/hub/login$1$is_args$args;
        proxy_set_header Host $http_host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # According to
        # https://jupyterhub.readthedocs.io/en/stable/reference/config-proxy.html
        # websocket headers
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection $connection_upgrade;

        # JHub does not know about SSL so it might try to redirect to http://, so
        # we need to make sure to add https://:
        proxy_redirect http://vre3.argo.grnet.gr https://vre3.argo.grnet.gr;
    }

    # This allows local login via GET (using login form),
    # but hides this from VRE users by using the /locallogin route.
    location = /erddap/locallogin {
        proxy_pass http://jhub_web/erddap/hub/login;
        proxy_set_header Host $http_host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection $connection_upgrade;
        proxy_redirect http://vre3.argo.grnet.gr https://vre3.argo.grnet.gr;
    }

    location ~ /erddap/?(.*)$ {
        proxy_pass http://jhub_web/erddap/$1$is_args$args;
        proxy_set_header Host $http_host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection $connection_upgrade;
        proxy_redirect vre3.argo.grnet.gr https://vre3.argo.grnet.gr;
    }

Testing the deployment

  • Hub must be healthy! That takes about a minute.
docker ps -a | grep erddap
  • Can you login locally at https://vre3.argo.grnet.gr/erddap/locallogin, using any name, and the ADMIN_PW ? (In this case, no data is available)
  • Can you login locally at https://vre3.argo.grnet.gr/erddap/locallogin, using an existing VRE name, and the ADMIN_PW ? (In this case, data should be availble)
  • Next, try logging in from the dashboard. For this, there must be a form on the dashboard that sends the user to this instance via POST.
  • A container called erddap-vre_xyzxyz should be spawned, and be healthy soon too.

Maintenance

  • How to view the hub's logs: cd /root/erddap and then dolo (shortcut in /root/.bashrc for: docker-compose down && docker-compose up -d && docker-compose logs --tail=100 - f)

  • How to change the image name (of the JupyterHub container):

    • Change it in docker-compose.yml and restart the service. Probably the previously spawned containers can keep running, unless the changes were too big.
    • Ideally, remove the old image: docker image rm <old-imagename>, so it does not take up space.
  • How to change the image name (of the spawned containers):

    • Change the value for DOCKER_JUPYTER_IMAGE in /root/erddap/.env. It is registry-sdc.argo.grnet.gr/ifr-sdn-subset-service:20200812 on 20201020.
    • Then restart the service.
    • Make sure to remove the existing containers docker rm <containername>, as running containers are not changed of course.
    • Ideally, remove the old image: docker image rm <old-imagename>, so it does not take up space.
  • How to restart the service:

    • Change into its directory via cd /root/erddap/ and then doco (shortcut in /root/.bashrc for: docker-compose down && docker-compose up -d && docker-compose logs --tail=100 - f)

Troubleshooting / previously solved problems

Error 500 : Internal Server Error

For several possible causes and solutions of Internal Server Errors, check out: https://github.com/merretbuurman/jupyterhub-vreauthenticator#troubleshooting--previously-solved-problems

User data

Inside the container, ERDDAP expects the user's data in /nextcloud. It should be directly the user's data, so no directory with the user's name first, and no files directory, ...

The same path that is passed by the file selector must be findable on this path.

si_exp@d7c0bdff8a30:/usr/local/apache2$ ls -lpah /nextcloud/
total 16K
drwxr-xr-x  4 isi_exp users   4.0K Apr 15 11:57 ./
drwxr-xr-x  1 root    root    4.0K Apr 15 16:04 ../
drwxr-xr-x  4 isi_exp isi_exp 4.0K Apr 15 11:57 ERDDAP_test_data/
...

Are the webapps deployed? / Redeploy

The erddap.war should be in /opt/tomcat8/webapps/, and also a erddap directory, created by the tomcat:

docker exec -it containername /bin/bash

isi_exp@d7c0bdff8a30:/usr/local/apache2$ ls /opt/tomcat8/webapps/
erddap	erddap.war
isi_exp@d7c0bdff8a30:/usr/local/apache2$ ls /opt/tomcat8/webapps/ -lpah
total 490M
drwxr-xr-x 3 isi_exp isi_exp 4.0K Apr 15 14:39 ./
drwxr-xr-x 1 isi_exp ditiisi 4.0K Mar  6 11:25 ../
drwxr-x--- 7 isi_exp isi_exp 4.0K Apr 15 14:39 erddap/
-rw-r--r-- 1 isi_exp isi_exp 490M Oct 17 10:16 erddap.war

To redeploy, move the erddap.war to a different place (tomcat should now remove the directory erddap), then move it back to /opt/tomcat8/webapps/ (tomcat should then recreate the erddap directory).

Enough Java heap space?

If you get a 404 when ERDDAP is loading, the memory may be a problem.

In that case, the log contains this text:

While trying to load datasetID=SeaDataNet_inSitu (after 2005 ms)
java.lang.RuntimeException: datasets.xml error on or before line #984: Your query produced too much data.  Try to request less data.  The request needs more memory (284 MB) than is ever safely available in this Java setup (185 MB). (TableWriterAll.cumulativeTable)

To check whether that's in the log:

docker exec -it containername /bin/bash
grep -r "more memory" /opt/tomcat8/content/erddap/erddapDirectory/logs/

Increase memory (Java Heap Space)

Please read this: https://crunchify.com/how-to-change-jvm-heap-setting-xms-xmx-of-tomcat/

Increase heap space for ERDDAP Java JVM in the docker-compose.yml:

   environment:
      ...
      JAVA_OPTS: '-Xms800M -Xmx800M'

This will be picked up in the jupyterhub_config.py:

JAVA_OPTS= os.environ.get('JAVA_OPTS', '-Xms200M -Xmx200M')
...
container_env['JAVA_OPTS'] = JAVA_OPTS

Enough overall memory in the container?

Depending on how much memory you need, also increase overall memory for the containers in the docker-compose.yml:

   environment:
      ...
      MEMORY_LIMIT: '2G'

This will be picked up in the jupyterhub_config.py:

MEMORY_LIMIT = os.environ.get('MEMORY_LIMIT', '2G')
...
# https://github.com/jupyterhub/dockerspawner#memory-limits
c.Spawner.mem_limit = MEMORY_LIMIT

Testing without the nginx proxy

Check out here: https://github.com/merretbuurman/jupyterhub-vreauthenticator#adding-ssl

Locations in container

# the log:
isi_exp@d7c0bdff8a30:/usr/local/apache2$ ls /opt/tomcat8/content/erddap/erddapDirectory/logs/
emailLog2020-04-15.txt	emailLog2020-04-16.txt	log.txt

# the dataset config XML files:
isi_exp@d7c0bdff8a30:/usr/local/apache2$ ls /opt/tomcat8/content/erddap/
datasets.xml  datasets_profile.xml.bk  datasets_template.xml  datasets_timeserie.xml.bk  datasets_trajectory.xml.bk  erddapDirectory  images  setup.xml

Getting into the container as root

docker exec -it -u root containername /bin/bash

Dev access to container-private logs

Copying the logs and setup.xml to Leo's NextCloud:

# setup.xml
USER_NAME='vre_buurmanmarineidorgr8255g4x'    # outdates username
LOG_OWNER='vre_lbruvrylmarineidorg9tjzpyb7'   # outdates username

# See what's in there:
docker exec erddap-${USER_NAME} ls -lpah /opt/tomcat8/logs/
docker exec erddap-${USER_NAME} ls -lpah /opt/tomcat8/logs/
docker exec erddap-${USER_NAME} ls -lpah /opt/tomcat8/content/erddap/
docker exec erddap-${USER_NAME} ls -lpah /opt/tomcat8/content/erddap/erddapDirectory/
docker exec erddap-${USER_NAME} ls -lpah /opt/tomcat8/content/erddap/erddapDirectory/logs/

# Copy setup.xml
docker cp erddap-${USER_NAME}:/opt/tomcat8/content/erddap/setup.xml /nfs-import/${LOG_OWNER}/files/service_logs/
docker cp erddap-${USER_NAME}:/opt/tomcat8/content/erddap/erddapDirectory/logs/log.txt /nfs-import/${LOG_OWNER}/files/service_logs/
docker cp erddap-${USER_NAME}:/opt/tomcat8/logs/localhost_access_log.2020-04-28.txt /nfs-import/${LOG_OWNER}/files/service_logs/

# Chown
chown -R 33:1000 /nfs-import/${LOG_OWNER}/files/service_logs

# Check
ls -lpah /nfs-import/${LOG_OWNER}/files/service_logs/
ls -lpah /nfs-import/${LOG_OWNER}/files/service_logs/leos_container
ls -lpah /nfs-import/${LOG_OWNER}/files/service_logs/merrets_container

⚠️ **GitHub.com Fallback** ⚠️