Synchronization - SeaDataCloud/Documentation GitHub Wiki

Install synchronization service

The synchronization service copies the data from NextCloud to the processing machines and back, using unison.

Currently, all three GRNET machines share the user data via NFS mount - no synchronisation needed. Synchronization is only needed with the DKRZ server that runs DIVA, DIVA-VIP and the Python and R Jupyter Notebooks.

Steps on "master server"

This part is to be installed on the "master server" (where NextCloud runs). Another section below has to be installed on the remote server(s).

Home directory for syncer service

We assume that you deploy the services in /opt/vre. In Athens, we use /root.

  • Create a home directory for this service, called syncer.
  • Download the docker-compose, the config file and the environment file
# We assume the services run in /opt
mkdir /opt/vre/syncer  # at STFC
#mkdir /root/syncer    # at GRNET

cd /opt/vre/syncer 
wget https://raw.githubusercontent.com/SeaDataCloud/vre-config/master/services/synchronizer/docker-compose.yml
wget https://raw.githubusercontent.com/SeaDataCloud/vre-config/master/services/synchronizer/remotehosts.json

# STFC:
wget https://raw.githubusercontent.com/SeaDataCloud/vre-config/master/services/synchronizer/env-stfc 
mv env-stfc .env

# GRNET:
wget https://raw.githubusercontent.com/SeaDataCloud/vre-config/master/services/synchronizer/env-grnet
mv env-grnet .env

The docker-compose file needs no changes (just make sure that WHITELIST_SERVERS contains all the server to which data must be synced, as comma-separated list without spaces). The other two files need some changes - see below.

Unison directory

This will contain the unison archive files. They should not be gone if a container has to be recreated. That's why we create them on the host and bind-mount them.

  • Create a directory .unison.
  • Chown it.
#mkdir /root/syncer/.unison         # at GRNET
#chown 33:33 /root/syncer/.unison   # at GRNET
mkdir /opt/vre/syncer/.unison       # at STFC
chown 33:33 /opt/vre/syncer/.unison # at STFC

[TODO] Add a section on this whole permissions thingy!!

SSH key for Unison

The Unison tool needs to login to the remote servers via SSH to copy files to/from there.

  • Create OpenSSH key pair (to be used by Unison to log via SSH into the remote nodes).
  • Make sure the private key has only permissions by the user themselves (0400 or 0600), otherwise SSH will complain. This seems to be ok by default.
  • Copy the public key to the remote nodes.
  • Check that you can ssh-login to those nodes by using that key. (The service node operators should tell you the username which you should use to login, probably vre).
cd /opt/vre/syncer/    # at STFC
# cd /root/syncer/     # at GRNET
ssh-keygen -t rsa -b 2048 -f vre-sync-key
ssh-copy-id -i ./vre-sync-key vre@<remotehost>

To test this:

ssh -i ./vre-ssh-key vre@<remotehost>
exit

Configure syncer

Adapt the remotehosts.json file. It must contain each service node that needs to receive user data (i.e. currently only bluewhale):

  • url: This is the FQDN of the remote server (e.g. jellyfish.argo.grnet.gr, bluewhale.dkrz.de).
  • user: The name of the user that runs the services on the remote server to which you want to sync. This should be vre (but ask the admin of the remote server to be sure).
  • dir: Directory where synced data should be stored to on the remote server.
  • servername: Arbitary string. This is a name/label for the server where the data is being synced to. [TODO] What for?
  • site: Arbitrary string. This is a name/label for the site where the server runs, e.g. grnet or stfc. This can be any string, but will be part of the URL to be called [TODO CHECK]

Changes to .env:

  • DATA_PATH should be the absolute path to the data to be synced, on the machine where the syncer runs.
  • PUBLIC_KEY_SSH_RSYNC: SSH private key, created above, with which you can ssh-login to the remote machine. Probably /opt/vre/syncer/vre-rsync-key.
  • THIS_SITE= [TODO]
  • THIS_HOST= [TODO]
  • PATH_TO_HEALTHCHECKS: Where the healthcheck_nginx.sh can be found.

Start the service:

docker-compose up -d && docker-compose logs --tail=100 -f

Add to revproxy

Now add this config to the reverse proxy and restart the reverse proxy:

vi /root/revproxy/nginx.conf     # at GRNET
vi /opt/vre/revproxy/nginx.conf  # at STFC

Add this upstream to the upstream section on top:

upstream up_sync {
    server syncer_proxy:80;
}

Add this location to the locations section below:

    # exact match for efficiency - this should handle all requests.
    location = /sync/bidirectional/ {
        proxy_pass http://up_sync;
        limit_except POST {
            deny all;
        }
        add_header match sync-exact;
        proxy_set_header Host  $host;
        proxy_set_header X-Real-IP $remote_addr;
        # From: https://www.nginx.com/blog/websocket-nginx/
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "Upgrade";
    }

    # fallback just in case (should no be necessary):
    location ^~ /sync/ {
        proxy_pass http://up_sync;
        limit_except POST {
            deny all;
        }
        add_header match sync;
        proxy_set_header Host  $host;
        proxy_set_header X-Real-IP $remote_addr;
        # From: https://www.nginx.com/blog/websocket-nginx/
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "Upgrade";
    }

This also works, but it is less efficient, because it matches the route via regex matching (~), so nginx must look for longer. The above uses exact matching (=) and prefix-matching (^~), so it should be matched faster.

    location  ~ /sync/(.*)$ {
        proxy_pass http://up_sync/sync/$1$is_args$args;
        limit_except POST {
            deny all;
        }
        add_header match sync;
        proxy_set_header Host  $host;
        proxy_set_header X-Real-IP $remote_addr;
        # From: https://www.nginx.com/blog/websocket-nginx/
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "Upgrade";
    }

Now restart the revproxy. It has to be restarted every time a service was restarted, otherwise nginx does not find the upstream.

cd /root/revproxy      # at GRNET
cd /opt/vre/revproxy   # at STFC
docker-compose down && docker-compose up -d && docker-compose logs --tail=100 -f

Steps on "remote server(s)"

This part is to be installed on the "remote server" (where the services run, and do not have access to the NextCloud data directly or via NFS). Currently, all three GRNET machines share the user data via NFS mount - no synchronisation needed. Synchronization is only needed with the DKRZ server that runs DIVA, DIVA-VIP and the Python and R Jupyter Notebooks.

The syncer will run as a linux user that will run the synchronisation. Ideally, use uid 1000, as Jupyter Notebooks and many other existing docker images run as uid 1000 by default. (If you do not use user 1000, you'll need some additional settings in the DIVA setup, described below). We suggest the name vre.

Unison linux user

  • Create a linux user vre.
  • Allow this user to login via SSH on this host.
  • Probably need to open port 22, too.
# Suggestion (TODO: TO BE TESTED):
sudo groupadd -g 100 vre
sudo useradd -u 1000 -m vre -g vre

# Set a password (needed for ssh login later):
sudo passwd vre # type password for the user, twice!

# Possibly need to enable ssh for this user here:
vi /etc/security/access.conf

What to change in /etc/security/access.conf: Add the line + : vre : ALL before (!!!) this line: - : ALL : ALL

  • Notify the operator of the master node (running the dashboard), which username they should be using, and make sure their public key is allowed to login to the machine as that user (add the public key of the master node for login as this user)
  • Ask the operator of the master node to test whether they can log in via ssh as this user from the master node (i.e. ssh port is open for the master node).

Install unison

  • Install unison version 2.40 (!) (must be 2.40 to match the CentOS machines - if we have different versions, the sync will not work. If anyone volunteers to build a fresher version for CentOS 7, you're welcome!)
  • Create a directory on the (shared) file system (readable+writable by this user) where the synchronizations will be written to. We will refer to this directory as the local SYNC_TARGET
mkdir "/srv/seadata/vre/sync"
# TODO: Maybe /var/seadata/vre might be better?
# TODO: If a mounted NFS is used, this will be /mnt/something?
# TODO: Any chowning, chmoding needed?

Note:

If during your tests, you accidentally delete the data on the "target", you will get the error The root of one of the replicas has been completely emptied. Unison may delete everything in the other replica when you try the next synchronization.

You can repair this easily:

  • Run a one-directional synchronization (rsync) to restore the data.
  • Then retry the bi-directional synchronization (unsion) to check if it worked fine, and to make sure the unison-archive-files are up to date.

See:


# Try syncing and run into conflict/error:
wget -O- http://orca.dkrz.de:5000/sync/bidirectional/dkrz/bluewhale/fritz/conflict/prefer_newer
# HTTP request sent, awaiting response... 500 INTERNAL SERVER ERROR

# Log output:
syncer_1  | 2019-12-11 11:42:06,816 - synchronizer - WARNING - FAIL: a fatal error occurred, or the execution was interrupted.
syncer_1  | 2019-12-11 11:42:06,817 - synchronizer - ERROR - Syncing failed: Contacting server... Connected [//xxx/fritz/files/Work -> //bluewhale.dkrz.de//xxx/sync_target/fritz_sync] Looking for changes   Waiting for changes from server Reconciling changes The root of one of the replicas has been completely emptied. Unison may delete everything in the other replica.  (Set the  'confirmbigdel' preference to false to disable this check.)  

# Run one-directional sync to bring data back:
wget -O- http://orca.dkrz.de:5000/sync/to_remote/dkrz/bluewhale/fritz/execute
# HTTP request sent, awaiting response... 200 OK

# See that data is back:
# on target server:
sudo ls -lpah /xxx/sync_target/fritz_sync
total 20K
drwxr-xr-x. 2 vre vre 123 Dec 11 12:44 ./
drwxr-xr-x. 3 vre vre  24 Dec 11 12:43 ../
-rw-r--r--. 1 vre vre  27 Dec 10 18:09 MYTESTFILE_src_51.txt
-rw-r--r--. 1 vre vre  27 Dec 10 18:09 MYTESTFILE_tgt_92.txt

# Run bidirectional sync to check if it worked (and to make sure archive files are ok):
wget -O- http://orca.dkrz.de:5000/sync/bidirectional/dkrz/bluewhale/fritz/conflict/prefer_newer
# HTTP request sent, awaiting response... 200 OK

# Yeah!

Testing the syncer

Prepare a test file on the main host (near the NextCloud data):

# at GRNET:
mkdir /mnt/sdc-nfs-data/vre_simonebeauvoir
mkdir /mnt/sdc-nfs-data/vre_simonebeauvoir/files
echo "blabla" /mnt/sdc-nfs-data/vre_simonebeauvoir/files/created_on_original.txt

Then run:

docker exec -it syncer_syncer_1 /bin/bash
curl -X POST --data "site=athina&host=sdc-test&username=vre_kroesus" http://localhost:5000//sync/bidirectional/
apt-get install curl
# Cannot install curl
docker exec -it vrehome_dashboard_1 /bin/bash
curl -X POST --data "site=athina&host=sdc-test&username=vre_kroesus" http://syncer_proxy:5000//sync/bidirectional/
# Connection refused

Troubleshooting / Maintenance

Checkout logs

docker logs syncer_syncer_1

Possible errors

  • Often, root-owned test files cause a http 500 Internal Server Error

How to redeploy the syncer

# development on local laptop
cd /home/merret/work/githubrepos/VRE/vre-data-synchronization-tool
# do stuff
# commit

Make sure to adapt the FROM statement in Dockerfile_2_nginx_uwsgi, so the correct base image gets used:

# on local laptop
# adapt FROM statement to use today's image tag
vi Dockerfile_2_nginx_uwsgi 

# ideally also adapt the docker image tag in build instructions at the
# bottom of these two:
vi Dockerfile_1_flask_only
vi Dockerfile_2_nginx_uwsgi

git push origin

I build my images on bluewhale server:

# on bluewhale
cd /home/dkrz/k204208/STACKS/BUILD_IMAGES/SYNCER/vre-data-synchronization-tool
git pull

These are the build instructions:

TAG='xxx-as33' # e.g. 20200428-as33
docker build --file Dockerfile_1_flask_only -t syncer_flask:${TAG} .

# When you have build a new image of the syncer_flask, then adapt the
# FROM statement in the Dockerfile_2_nginx_uwsgi, and build a new one:
vi Dockerfile_2_nginx_uwsgi
docker build --file Dockerfile_2_nginx_uwsgi -t registry-sdc.argo.grnet.gr/syncer_wsgi:${TAG} .
docker push registry-sdc.argo.grnet.gr/syncer_wsgi:${TAG}

Now pull on the server where it runs (sdc-test.argo.grnet.gr):

# on sdc-test
docker pull registry-sdc.argo.grnet.gr/syncer_wsgi:XXXX
cd /root/syncer
vi docker-compose.yml # adapt tag

Now remove unison archive files on remote

# on bluewhale
sudo su - vre
ls -lpah /home/vre/.unison
rm -rf /home/vre/.unison/*
ls -lpah /home/vre/.unison

Then restart it:

# on sdc-test
docker-compose down && docker-compose up -d && docker-compose logs --tail=100 -f

How to reset the sync

"It is safe to “brainwash” Unison by deleting its archive files on both replicas. The next time it runs, it will assume that all the files it sees in the replicas are new." [https://www.cis.upenn.edu/~bcpierce/unison/download/releases/stable/unison-manual.html]

  • Delete .unison files on remote (in the HOME of the user who runs unison there)
  • Delete .unison files on syncer (inside its container - this step can be omitted if you remove and redeploy the syncer) - note that we now bind-mount these, so no need to go into the container to delete them!

When to do this?

  • WIP

What is the behaviour in next sync?

  • I assume it will send all files from A to B, and all from B to A, and for those that exist in both places, it will take the newer one or whichever preference is specified.
# on bluewhale
sudo su - vre
ls -lpah /home/vre/.unison
rm -rf /home/vre/.unison/*
ls -lpah /home/vre/.unison

Then the syncer

docker exec -it syncer_syncer_1 /bin/bash
ls -lpah /var/www/.unison
rm -rf /var/www/.unison/*
l/s -lpah /var/www/.unison

How to test the sync

# on sdc-test
echo "blabla" > /nfs-export/vre_xxxmarineidxxxx/files/TESTFILE_SYNC_CENTRAL.txt
# on bluewhale
echo "blabla" > /.../sync_from_athens/nextcloud_data/vre_xxxmarineidxxxx/files/TESTFILE_SYNC_REMOTE.txt

Now in the GUI

  • To to private workspace
  • Maybe also create a test file here
  • Click sync
  • Go out and back in

How to check?

Do you see all the files...

  • In the GUI?
  • On bluewhale? ls -lpah /.../sync_from_athens/nextcloud_data/vre_xxxmarineidxxxx/files/
  • On sdc-test? ls -lpah /nfs-export/vre_xxxmarineidxxxx/files/

How to run a test

docker exec -it syncer_syncer_1 /bin/bash
pip install requests --user
python
import requests
username='vre_pillepalle' #  username='all_users'
resp = requests.post('http://syncer_proxy/sync/bidirectional/', data=dict(site='hamburg', host='bluewhale', username=username))
resp
resp.content

API

Merret, 20201027

  • Only works via POST
  • Returns http 200, 409 or 500!
  • Apparently, only the nginx (syncer_proxy) accepts requests.
  • Syncer (flask application) exposes port 5000 inside the docker network vredash, but this might not be needed at all, as the proxy communicates with it using wsgi socket. Also, I'm not sure there's actually a http server inside syncer container...
  • Syncer_proxy (nginx server) exposes port 80 (unless we switch on SSL termination, but this is done by the main revproxy).
# inside syncer_proxy container:
docker exec -it syncer_syncer_proxy_1 /bin/bash
curl -X POST --data "site=athina&host=sdc-test&username=vre_xxx" http://localhost:80/sync/bidirectional/
# inside vre dashboard container:
docker exec -it vrehome_dashboard_1 /bin/bash
apt-get install wget
wget --post-data 'site=athina&host=sdc-test&username=vre_xxx' http://syncer_proxy:80/sync/bidirectional/
cat index.html 
# Hello! Sync between here and host vre_kroesus (user sdc-test) at site athina No conflicts occurred. Successful synchronization; everything is up-to-date now.

From outside (now with https!):

  • This runs into revproxy on port 443
  • This is proxy_passed to port 80 on the upstream, which is syncer_proxy
curl -X POST --data "site=athina&host=sdc-test&username=vre_xxx" https://vre.seadatanet.org/sync/bidirectional/

Bidirectional (unison):

curl -X POST --data "site=athina&host=sdc-test&username=vre_xxx" http://syncer_proxy:80/sync/bidirectional/
curl -X POST --data "site=athina&host=sdc-test&username=vre_xxx" http://syncer_proxy:80/sync/bidirectional/conflict/prefer_newer
curl -X POST --data "site=athina&host=sdc-test&username=vre_xxx" http://syncer_proxy:80/sync/bidirectional/conflict/prefer_central
curl -X POST --data "site=athina&host=sdc-test&username=vre_xxx" http://syncer_proxy:80/sync/bidirectional/conflict/prefer_replica

One direction (rsync):

curl -X POST --data "site=athina&host=sdc-test&username=vre_kroesus" http://syncer_proxy:80/sync/to_remote/
curl -X POST --data "site=athina&host=sdc-test&username=vre_kroesus" http://syncer_proxy:80/sync/from_remote/
⚠️ **GitHub.com Fallback** ⚠️