Service: ERDDAP Subsetting Service - SeaDataCloud/Documentation GitHub Wiki
For every user, a ERDDAP container is started by a JupyterHub instance. JupyterHub does the user authentication, starts the container, and redirects the user to the container. The user's NextCloud data is also bind-mounted by JupyterHub.
The service is running on vre3.argo.grnet.gr
as plain http. We are using the nginx proxy on port 443 for the SSL termination, so ERDDAP is accessible on https://vre3.argo.grnet.gr/erddap
. To login there, you need a POST
request sending vre_username
, vre_displayname
, service_auth_token
and vre_URL
.
For testing, there is a HTML login form available at https://vre3.argo.grnet.gr/erddap/locallogin
.
What will be mounted into the JupyterHub (defined in docker-compose.yml):
- NFS-Mounted NextCloud user directories. Mounted from
HOST_WHERE_ARE_USERDIRS
(GRNET:/mnt/sdc-nfs-data/
) to/usr/share/userdirectories/
. - JupyterHub config file. Mounted from
/root/erddap/jupyterhub_config.py
to/srv/jupyterhub/jupyterhub_config.py
. - Docker socket, needed to spawn containers. Mounted from
/var/run/docker.sock
to the same location inside the container.
What will be mounted into the spawned containers (defined in jupyterhub_config.py, setting c.DockerSpawner.volumes
, via the dict volume_mounts
):
- The user's directory. Mounted from
HOST_WHERE_ARE_USERDIRS
(GRNET:/mnt/sdc-nfs-data/<username>/files
) to/nextcloud
, where ERDDAP expects it.
We used to have to mount the webapps and the service data, but since 20201023 they are included in the image.
Some info:
- JupyterHub image:
registry-sdc.argo.grnet.gr/jupyterhub_vre:20201020
(status 20201020) - ERDDAP image:
registry-sdc.argo.grnet.gr/ifr-sdn-subset-service:20200812
(status 20201020) - Service runs as uid 1000, called
isi_exp
in the container to be able to read/write the synced user data.- This is defined inside the service, and can only be changed in the service image. By setting
RUN_AS_USER=1000
we make sure that the JupyterHub also chowns files to1000
when creating user directories. But as we mount existing user directories, this should not really have an effect. However, the mounted data must be readable/writeable byuid 1000
. But it is owned byuid 33
?! Readable/writeable by groupgid 1000
, but the service seems to run as groupgid 10004
- ??? TODO CHECK - Data inside the service image is owned by
gid 10004
(ditiisi
), but I am not sure why.
- This is defined inside the service, and can only be changed in the service image. By setting
The user data (NextCloud data) should already be in place before starting the deployment. Currently, we run ERDDAP on a server at GRNET, located close to the other VRE servers at GRNET. That's why we are mounting the data via NFS.
If you run DIVA on a VM with synchronized (rather than NFS-mounted) NextCloud data, you mainly have to adapt USERDIR_TEMPLATE_HOST
in the .env
file (further below).
Make sure you have the NFS-mounted user data ready. On GRNET's VMs it sits in /mnt/sdc-nfs-data/
, but you can use any other location - just make sure you specify it in HOST_WHERE_ARE_USERDIRS
in the .env file.
The value of USERDIR_TEMPLATE_HOST
should be /{raw_username}/files
, so that the subdirectories called files
inside directories named <username>
are mounted.
This is how it should look:
[root@snf-7990 ~]# ls -lpah /mnt/sdc-nfs-data/
total 6.8M
drwxrwx--- 27 33 1000 4.0K Oct 20 15:18 ./
drwxr-xr-x. 7 root root 4.0K Sep 14 09:19 ../
drwxrwxr-x 4 33 1000 4.0K Oct 20 05:08 admin/
drwxrwxr-x 10 33 1000 4.0K Sep 15 10:35 appdata_ocn8npls9fx3/
drwxrwxr-x 6 33 1000 4.0K Sep 15 09:15 appdata_ocz2pnx2ewe6/
drwxrwxr-x 2 33 1000 4.0K Sep 15 09:15 files_external/
-rw-rw-r-- 1 33 1000 324 Sep 15 09:15 .htaccess
-rw-rw-r-- 1 33 1000 0 Sep 15 09:15 index.html
-rw-r----- 1 33 1000 44K Oct 20 14:53 nextcloud.log
-rw-rw-r-- 1 33 1000 0 Sep 14 12:46 .ocdata
-rw-rw-r-- 1 33 1000 6.7M Oct 20 15:18 owncloud.db
drwxrwxr-x 4 33 1000 4.0K Sep 17 15:07 vre_tomandjerrymarineidorgcgfqwt7j/ # one user
drwxrwxr-x 4 33 1000 4.0K Sep 25 05:54 vre_bugsbunnymarineidorgy4773w8y/ # another user
(...)
When peeking into a user directory, you should see /files
, and only in there, the actual contents:
[root@snf-7990 ~]# ls -lpah /mnt/sdc-nfs-data/vre_tomandjerrymarineidorgcgfqwt7j/
total 20K
drwxrwxr-x 5 33 33 4.0K Sep 25 11:23 ./
drwxrwx--- 36 33 33 4.0K Oct 23 13:15 ../
drwxrwxr-x 13 33 33 4.0K Oct 21 17:56 files/
...
[root@snf-7990 ~]# ls -lpah /mnt/sdc-nfs-data/vre_tomandjerrymarineidorgcgfqwt7j/files
total 52K
drwxrwxr-x 13 33 33 4.0K Oct 21 17:56 ./
drwxrwxr-x 5 33 33 4.0K Sep 25 11:23 ../
drwxrwxr-x 2 33 33 4.0K Sep 21 09:59 BioQC_test_data/
drwxrwxr-x 6 33 33 4.0K Sep 21 10:08 ERDDAP_test_data/
drwxrwxr-x 2 33 33 4.0K Sep 21 09:59 Imports/
drwxrwxr-x 2 33 33 4.0K Sep 21 09:59 Results/
drwxrwxr-x 5 33 33 4.0K Sep 21 10:00 webODV_test_data/
drwxrwxr-x 2 33 33 4.0K Sep 28 08:12 Work/
...
In case you operate on sychnronized data, make sure you have the sync in place. On bluewhale the synchronized data sits in /scratch/vre/sync_from_athens/nextcloud_data/
, but you can use any other location - just make sure you specify it in HOST_WHERE_ARE_USERDIRS
in the .env file.
The value of USERDIR_TEMPLATE_HOST
should be /{raw_username}
, so that the directories named <username>
are mounted (without a /files
subdirectory).
This is how it should look:
[alice@bluewhale ~]$ ls -lpah /scratch/vre/sync_from_athens/nextcloud_data
total 4.0K
drwxr-xr-x. 15 vre vre 4.0K Oct 20 16:26 ./
drwxr-xr-x. 5 root root 92 Sep 14 10:39 ../
drwxrwxr-x. 13 vre vre 230 Oct 21 01:48 vre_tomandjerrymarineidorgcgfqwt7j/ # one user
drwxr-xr-x. 2 vre vre 6 Oct 1 10:04 vre_bugsbunnymarineidorgy4773w8y/ # another user
...
Inside the user directories, their content should be directly visible:
[alice@bluewhale ~]$ ls -lpah /scratch/vre/sync_from_athens/nextcloud_data/vre_tomandjerrymarineidorgcgfqwt7j/
total 8.0K
drwxrwxr-x. 13 vre vre 230 Oct 21 01:48 ./
drwxr-xr-x. 15 vre vre 4.0K Oct 20 16:26 ../
drwxrwxr-x. 3 vre vre 86 Oct 20 03:12 BioQC_test_data/
drwxrwxr-x. 6 vre vre 68 Sep 25 13:29 ERDDAP_test_data/
drwxrwxr-x. 2 vre vre 6 Sep 25 13:29 Imports/
drwxrwxr-x. 2 vre vre 6 Sep 25 13:29 Results/
drwxrwxr-x. 5 vre vre 112 Sep 25 13:29 webODV_test_data/
drwxrwxr-x. 2 vre vre 81 Sep 28 10:13 Work/
...
- Create home dir called
erddap
(wherever you have your service directories, e.g./root/erddap
)
mkdir /root/erddap
- Download docker-compose.yml, .env, config
cd /root/erddap
wget https://raw.githubusercontent.com/SeaDataCloud/vre-config/master/services/erddap/docker-compose.yml
wget https://raw.githubusercontent.com/SeaDataCloud/vre-config/master/services/erddap/jupyterhub_config.py
wget https://raw.githubusercontent.com/SeaDataCloud/vre-config/master/services/erddap/env-erddap
mv env-erddap.env
Changes to docker-compose.yml:
- No changes. (But make sure the URL used for authentication is included in the
WHITELIST_AUTH
env value!).
Changes to jupyterhub_config.py:
- No changes. (But there is a section with ERDDAP-specific config, which should be included via
if True:
by default, but to be sure you can check again, just in case this config file was copied from some other service where it was set toif False:
...)
Changes to .env:
- Change the value of
ADMIN_PW
to some value of your choice (replacefoo
). - Change the value of
JUPYTERHUB_CRYPT_KEY
to the result of runningopenssl rand -hex 32
(replacefoo
). - Change the value of
HOST_NAME
to the FQDM of the machine where erddap will be reachable (replacejellyfish.argo.grnet.gr
). - Other values may have to change in case you use different paths than in this guide, e.g.:
-
HOST_WHERE_ARE_USERDIRS
in case you don't have the NextCloud data in/mnt/sdc-nfs-data/
- ... (just look :)) ...
-
vi .env # do changes!
openssl rand -hex 32 # result goes into .env, for CRYPT_KEY!
- Pull the ERDDAP image
docker pull registry-sdc.argo.grnet.gr/ifr-sdn-subset-service:20201022
- Start the service
docker-compose down && docker-compose up -d && docker-compose logs --tail=100 -f
- Now deploy or update the reverse proxy, without which the service cannot be reached from outside! See: https://github.com/SeaDataCloud/Documentation/wiki/Reverse-Proxy
- After every restart of erddap, make sure to restart the revproxy too!
cd /root/revproxy
vi proxy.conf
# add locations for this service to config
docker-compose down && docker-compose up -d && docker-compose logs --tail=100 -f
Add this config to the reverse proxy on your machine:
Upstreams section:
upstream jhub_web {
server erddap_hub_erddap_1:8000;
}
Somewhere above:
# From:
# https://jupyterhub.readthedocs.io/en/stable/reference/config-proxy.html
# top-level http config for websocket headers
# If Upgrade is defined, Connection = upgrade
# If Upgrade is empty, Connection = close
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}
Locations section:
location = /erddap/hub/login {
if ($request_method = GET ) {
return 302 https://vre.seadatanet.org;
}
proxy_pass http://jhub_web/erddap/hub/login$1$is_args$args;
proxy_set_header Host $http_host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# According to
# https://jupyterhub.readthedocs.io/en/stable/reference/config-proxy.html
# websocket headers
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
# JHub does not know about SSL so it might try to redirect to http://, so
# we need to make sure to add https://:
proxy_redirect http://vre3.argo.grnet.gr https://vre3.argo.grnet.gr;
}
# This allows local login via GET (using login form),
# but hides this from VRE users by using the /locallogin route.
location = /erddap/locallogin {
proxy_pass http://jhub_web/erddap/hub/login;
proxy_set_header Host $http_host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
proxy_redirect http://vre3.argo.grnet.gr https://vre3.argo.grnet.gr;
}
location ~ /erddap/?(.*)$ {
proxy_pass http://jhub_web/erddap/$1$is_args$args;
proxy_set_header Host $http_host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
proxy_redirect vre3.argo.grnet.gr https://vre3.argo.grnet.gr;
}
- Hub must be healthy! That takes about a minute.
docker ps -a | grep erddap
- Can you login locally at https://vre3.argo.grnet.gr/erddap/locallogin, using any name, and the
ADMIN_PW
? (In this case, no data is available) - Can you login locally at https://vre3.argo.grnet.gr/erddap/locallogin, using an existing VRE name, and the
ADMIN_PW
? (In this case, data should be availble) - Next, try logging in from the dashboard. For this, there must be a form on the dashboard that sends the user to this instance via POST.
- A container called
erddap-vre_xyzxyz
should be spawned, and be healthy soon too.
-
How to view the hub's logs:
cd /root/erddap
and thendolo
(shortcut in /root/.bashrc for:docker-compose down && docker-compose up -d && docker-compose logs --tail=100 - f
) -
How to change the image name (of the JupyterHub container):
- Change it in docker-compose.yml and restart the service. Probably the previously spawned containers can keep running, unless the changes were too big.
- Ideally, remove the old image:
docker image rm <old-imagename>
, so it does not take up space.
-
How to change the image name (of the spawned containers):
- Change the value for
DOCKER_JUPYTER_IMAGE
in/root/erddap/.env
. It isregistry-sdc.argo.grnet.gr/ifr-sdn-subset-service:20200812
on 20201020. - Then restart the service.
- Make sure to remove the existing containers
docker rm <containername>
, as running containers are not changed of course. - Ideally, remove the old image:
docker image rm <old-imagename>
, so it does not take up space.
- Change the value for
-
How to restart the service:
- Change into its directory via
cd /root/erddap/
and thendoco
(shortcut in /root/.bashrc for:docker-compose down && docker-compose up -d && docker-compose logs --tail=100 - f
)
- Change into its directory via
For several possible causes and solutions of Internal Server Errors, check out: https://github.com/merretbuurman/jupyterhub-vreauthenticator#troubleshooting--previously-solved-problems
Inside the container, ERDDAP expects the user's data in /nextcloud
. It should be directly the user's data, so no directory with the user's name first, and no files
directory, ...
The same path that is passed by the file selector must be findable on this path.
si_exp@d7c0bdff8a30:/usr/local/apache2$ ls -lpah /nextcloud/
total 16K
drwxr-xr-x 4 isi_exp users 4.0K Apr 15 11:57 ./
drwxr-xr-x 1 root root 4.0K Apr 15 16:04 ../
drwxr-xr-x 4 isi_exp isi_exp 4.0K Apr 15 11:57 ERDDAP_test_data/
...
The erddap.war
should be in /opt/tomcat8/webapps/
, and also a erddap
directory, created by the tomcat:
docker exec -it containername /bin/bash
isi_exp@d7c0bdff8a30:/usr/local/apache2$ ls /opt/tomcat8/webapps/
erddap erddap.war
isi_exp@d7c0bdff8a30:/usr/local/apache2$ ls /opt/tomcat8/webapps/ -lpah
total 490M
drwxr-xr-x 3 isi_exp isi_exp 4.0K Apr 15 14:39 ./
drwxr-xr-x 1 isi_exp ditiisi 4.0K Mar 6 11:25 ../
drwxr-x--- 7 isi_exp isi_exp 4.0K Apr 15 14:39 erddap/
-rw-r--r-- 1 isi_exp isi_exp 490M Oct 17 10:16 erddap.war
To redeploy, move the erddap.war
to a different place (tomcat should now remove the directory erddap
), then move it back to /opt/tomcat8/webapps/
(tomcat should then recreate the erddap
directory).
If you get a 404
when ERDDAP is loading, the memory may be a problem.
In that case, the log contains this text:
While trying to load datasetID=SeaDataNet_inSitu (after 2005 ms)
java.lang.RuntimeException: datasets.xml error on or before line #984: Your query produced too much data. Try to request less data. The request needs more memory (284 MB) than is ever safely available in this Java setup (185 MB). (TableWriterAll.cumulativeTable)
To check whether that's in the log:
docker exec -it containername /bin/bash
grep -r "more memory" /opt/tomcat8/content/erddap/erddapDirectory/logs/
Increase memory (Java Heap Space)
Please read this: https://crunchify.com/how-to-change-jvm-heap-setting-xms-xmx-of-tomcat/
Increase heap space for ERDDAP Java JVM in the docker-compose.yml:
environment:
...
JAVA_OPTS: '-Xms800M -Xmx800M'
This will be picked up in the jupyterhub_config.py:
JAVA_OPTS= os.environ.get('JAVA_OPTS', '-Xms200M -Xmx200M')
...
container_env['JAVA_OPTS'] = JAVA_OPTS
Depending on how much memory you need, also increase overall memory for the containers in the docker-compose.yml:
environment:
...
MEMORY_LIMIT: '2G'
This will be picked up in the jupyterhub_config.py:
MEMORY_LIMIT = os.environ.get('MEMORY_LIMIT', '2G')
...
# https://github.com/jupyterhub/dockerspawner#memory-limits
c.Spawner.mem_limit = MEMORY_LIMIT
Check out here: https://github.com/merretbuurman/jupyterhub-vreauthenticator#adding-ssl
# the log:
isi_exp@d7c0bdff8a30:/usr/local/apache2$ ls /opt/tomcat8/content/erddap/erddapDirectory/logs/
emailLog2020-04-15.txt emailLog2020-04-16.txt log.txt
# the dataset config XML files:
isi_exp@d7c0bdff8a30:/usr/local/apache2$ ls /opt/tomcat8/content/erddap/
datasets.xml datasets_profile.xml.bk datasets_template.xml datasets_timeserie.xml.bk datasets_trajectory.xml.bk erddapDirectory images setup.xml
docker exec -it -u root containername /bin/bash
Copying the logs and setup.xml to Leo's NextCloud:
# setup.xml
USER_NAME='vre_buurmanmarineidorgr8255g4x' # outdates username
LOG_OWNER='vre_lbruvrylmarineidorg9tjzpyb7' # outdates username
# See what's in there:
docker exec erddap-${USER_NAME} ls -lpah /opt/tomcat8/logs/
docker exec erddap-${USER_NAME} ls -lpah /opt/tomcat8/logs/
docker exec erddap-${USER_NAME} ls -lpah /opt/tomcat8/content/erddap/
docker exec erddap-${USER_NAME} ls -lpah /opt/tomcat8/content/erddap/erddapDirectory/
docker exec erddap-${USER_NAME} ls -lpah /opt/tomcat8/content/erddap/erddapDirectory/logs/
# Copy setup.xml
docker cp erddap-${USER_NAME}:/opt/tomcat8/content/erddap/setup.xml /nfs-import/${LOG_OWNER}/files/service_logs/
docker cp erddap-${USER_NAME}:/opt/tomcat8/content/erddap/erddapDirectory/logs/log.txt /nfs-import/${LOG_OWNER}/files/service_logs/
docker cp erddap-${USER_NAME}:/opt/tomcat8/logs/localhost_access_log.2020-04-28.txt /nfs-import/${LOG_OWNER}/files/service_logs/
# Chown
chown -R 33:1000 /nfs-import/${LOG_OWNER}/files/service_logs
# Check
ls -lpah /nfs-import/${LOG_OWNER}/files/service_logs/
ls -lpah /nfs-import/${LOG_OWNER}/files/service_logs/leos_container
ls -lpah /nfs-import/${LOG_OWNER}/files/service_logs/merrets_container