WMCore central services deployment - dmwm/WMCore GitHub Wiki

This wiki describes how you can create a CERN Openstack VM and deploy WMCore central services on it. Full - and slightly outdated - documentation can be found in the HTTP group documentation webpage: Request vm and set cmsweb env

VM node creation and initial setup

Once your CC7 VM has been created (usually through your personal Openstack project), you need to instal some extra RPMs required to run some of our central services:

[user@your-dev-vm ~]$ sudo yum install libXcursor libXrandr libXi libXinerama unzip

and given that we no longer ship a specific OpenSSL library with our services, we need to also ensure openssl packages are available in our VM. So install these if not yet available:

[user@your-dev-vm ~]$ openssl openssl-devel openssl-libs

In order to properly run MSUnmerged, which depends on the GFAL2 library, we need to perform an extra setup to bring up-to-date packages. We need to install and enable the DMC EL7 repository, which brings a newer and more stable gfal2 library and plugins, it can be done like:

sudo yum-config-manager --add-repo https://dmc-repo.web.cern.ch/dmc-repo/dmc-el7.repo
sudo yum-config-manager --enable dmc-el7/x86_64

with the new repository configured, we can now install and/or upgrade the gfal2 libraries (glib2 is likely already up-to-date, but I keep here for completeness):

sudo yum install glib2 glib2-devel gfal2 gfal2-devel

if everything goes fine, you should see GFAL2 libraries under the version of 2.20.0-1.el7.cern. In case you want to have all the gfal2 protocol plugins available, you need to also install these:

yum install gfal2-plugin-gridftp gfal2-plugin-file gfal2-plugin-http gfal2-plugin-srm gfal2-plugin-xrootd

Perform basic cmsweb development vm install

[user@your-dev-vm ~]$ mkdir /tmp/foo && cd /tmp/foo
[user@your-dev-vm foo]$ git clone git://github.com/dmwm/deployment.git cfg
[user@your-dev-vm foo]$ cfg/Deploy -t dummy -s post $PWD system/devvm
<snip>
INFO: installation completed sucessfully
[user@your-dev-vm ~]$ cd && rm -rf /tmp/foo

This step adds several users and groups and also will add your user to several groups.

Make sure to log out and log back in before proceeding!

Set up wmcore auth area

[user@your-dev-vm ~]$ mkdir -p /data/auth/wmcore

Copy cert/key service certificate (/data/certs/service{cert,key}.pem) from a wmagent node and place it under /data/auth, e.g.:

[user@your-dev-vm ~]$ mv servicecert.pem dmwm-service-cert.pem
[user@your-dev-vm ~]$ mv servicekey.pem dmwm-service-key.pem
[user@your-dev-vm ~]$ chgrp -R _sw /data/auth
[user@your-dev-vm ~]$ chmod ug=r,o-rwx $(find /data/auth -type f)
[user@your-dev-vm ~]$ chmod u=rwx,g=rx,o-rwx $(find /data/auth -type d) 

Using your proxy here will not work with Rucio so the copying of a service cert from an agent is now required.

Automated deployment of WMCore CMSWEB-services into your own VM

In order to automate the above process one can use the script suggested in this PR: https://github.com/dmwm/WMCore/pull/10014 It basically repeats all the steps explained in this document, and is having the ability to have the set of patches to be applied (both WMCore and deployment patches) provided at the very begging of the execution. One of the positive sides of using this script is the fact that it is a unified tool and all of us would follow the same set of steps in the deployment process which would guarantee problem reproducibility when it comes to issues originating in the deployment process (even during a docker image creation). Here follows the help function to the script with 3 examples of it's usage:

[user@your-dev-vm ~]$ deploy/deploy-centralvm.sh -h 

 Usage: deploy-centralvm.sh -d <deployment_tag> -c <central_services_url> [-s reqmgr2ms -r comp.tivanov] [-r  <repository>] [-p  <patches>] [-s  <service_names>] [-l  <component_list>]

   -d  <deployment_tag>   CMSWEB deployment tag used for this deployment
   -b  <deployment_patch> List of PR numbers to be applied to the deployment scripts
                          (in double quotes and space separated e.g. "967 968")
   -r  <repository>       Comp repository to look for the RPMs (defaults to -r comp)
   -p  <patches>          List of PR numbers
                          (in double quotes and space separated e.g. "5906 5934 5922")
   -s  <service_names>    List of service names to be patched
                          (in double quotes and space separated (e.g. "rqmgr2 reqmgr2ms")
   -c  <central_services> Url to central services (e.g. tivanov-unit01.cern.ch)
   -l  <component_list>   List of components to be deployed
                          (in double quotes and space separated e.g. "frontend couchdb reqmgr2")
   -h <help>              Provides help to the current script

 Example: ./deploy-centralvm.sh -d HG2011a -c tivanov-unit01.cern.ch
 Example: ./deploy-centralvm.sh -d HG2011a -b "967" -p "10003" -s reqmgr2ms -r comp.tivanov -c tivanov-unit01.cern.ch
 Example: yes | ./deploy-centralvm.sh -d HG2011a -b "967" -p "10003" -s reqmgr2ms -r comp.tivanov -c tivanov-unit01.cern.ch

Manual deployment of WMCore CMSWEB-services into your own VM

All the procedure described in this section follows the same guidelines recommended by CMSWEB package deployment.

To get started, we need to know which deployment tag to be used and download it (replace HG1904a by whatever you need):

cd /data
(cd /data; git clone git://github.com/dmwm/deployment.git cfg && cd cfg && git reset --hard HG2103a)

NOTE: by default, not all CherryPy threads are enabled, so you also need to enable your hostname in the config files, e.g. (replace alancc7-cloud by your short hostname in the line below):

sed -i 's/HOST.startswith("vocms0117"):/HOST.startswith("vocms0117") or HOST.startswith("alancc7-cloud"):/g' cfg/{reqmgr2,reqmon,workqueue}/config.py

And in case you need to patch the deployment configurations, this is how you can proceed (replace 740 by your patch number):

curl https://patch-diff.githubusercontent.com/raw/dmwm/deployment/pull/972.patch | patch -d cfg/ -p 1

Deploy the most common WMCore services with the line below (you might want to remove t0_reqmon ...):

(VER=HG2103a REPO="-r comp=comp" A=/data/cfg/admin; ARCH=slc7_amd64_gcc630;
 cd /data;
 $A/InstallDev -R comp@$VER -A $ARCH -s image -v $VER -a $PWD/auth $REPO -p "admin frontend couchdb reqmgr2 reqmgr2ms workqueue reqmon t0_reqmon acdcserver")

note that, if you have RPMs in your private repo, the first line has to be adapted, e.g.:

(VER=HG2103a-comp REPO="-r comp=comp.amaltaro" A=/data/cfg/admin; ARCH=slc7_amd64_gcc630;

Once services are deployed, there is a cronjob that queries CRIC and populates a list of DNs - and their CMS roles - that are allowed to access your services through HTTP requests. If you want to change this list and allow only a few specific persons (and WMAgent DNs), then you need to:

  1. comment out a crontab entry (the mkauthmap one, running every 4min)
  2. then you need to edit /data/srv/state/frontend/etc/authmap.json and leave only the DNs that you want

Update the fake service certificate files placed under each service area, e.g.:

sudo chmod 660 /data/srv/current/auth/{reqmgr2,workqueue,acdcserver,reqmon,t0_reqmon,reqmgr2ms}/dmwm-service-{cert,key}.pem
sudo cp /data/auth/dmwm-service-cert.pem /data/srv/current/auth/reqmgr2/dmwm-service-cert.pem 
sudo cp /data/auth/dmwm-service-cert.pem /data/srv/current/auth/workqueue/dmwm-service-cert.pem 
sudo cp /data/auth/dmwm-service-cert.pem /data/srv/current/auth/acdcserver/dmwm-service-cert.pem 
sudo cp /data/auth/dmwm-service-cert.pem /data/srv/current/auth/reqmon/dmwm-service-cert.pem 
sudo cp /data/auth/dmwm-service-cert.pem /data/srv/current/auth/t0_reqmon/dmwm-service-cert.pem 
sudo cp /data/auth/dmwm-service-cert.pem /data/srv/current/auth/reqmgr2ms/dmwm-service-cert.pem 
sudo cp /data/auth/dmwm-service-key.pem /data/srv/current/auth/reqmgr2/dmwm-service-key.pem
sudo cp /data/auth/dmwm-service-key.pem /data/srv/current/auth/workqueue/dmwm-service-key.pem
sudo cp /data/auth/dmwm-service-key.pem /data/srv/current/auth/acdcserver/dmwm-service-key.pem
sudo cp /data/auth/dmwm-service-key.pem /data/srv/current/auth/reqmon/dmwm-service-key.pem
sudo cp /data/auth/dmwm-service-key.pem /data/srv/current/auth/t0_reqmon/dmwm-service-key.pem
sudo cp /data/auth/dmwm-service-key.pem /data/srv/current/auth/reqmgr2ms/dmwm-service-key.pem
sudo chmod 440 /data/srv/current/auth/{reqmgr2,workqueue,acdcserver,reqmon,t0_reqmon,reqmgr2ms}/dmwm-service-{cert,key}.pem
sudo chmod 400 /data/srv/current/auth/{workqueue,reqmgr2ms}/dmwm-service-key.pem
sudo chown _reqmgr2ms:_config /data/srv/current/auth/reqmgr2ms/dmwm-service-key.pem
sudo chown _workqueue:_config /data/srv/current/auth/workqueue/dmwm-service-key.pem

If patches need to be applied, this is how you can proceed (mind the service you need patching - in this example we are only patching reqmgr2 - replace $PR by the pull request number):

cd /data/srv/current
wget -nv https://patch-diff.githubusercontent.com/raw/dmwm/WMCore/pull/$PR.patch -O - | patch -d apps/reqmgr2/lib/python2.7/site-packages/ -p 3

Make sure there are no services running yet:

(A=/data/cfg/admin; cd /data; $A/InstallDev -s status)

Injecting campaign configurations into your central CouchDB

This step is required to allow workflows to move on in the system, otherwise they won't pass the assigned status. Here is a description of how to inject those json documents into your VM Injecting campaigns into CouchDB

Certificate authentication

If you see SSL authorization issues between your Agent and your VM (as shown below), you need to follow this step in order to add an unencrypted version of your host certificate to all your VM services. You can follow this link for instructions on how to create such service certificates. It is probably better (and safer) to just create a proxy and copy it to the location.

WARNING:root:Http request failed, retrying once again..
[07/Dec/2018:23:55:04]  SERVER OTHER ERROR ssl.SSLError 241537094a833a156b6bb10c78b9e82b ([SSL] PEM lib (_ssl.c:2709))
INFO:cherrypy.error:[07/Dec/2018:23:55:04]  SERVER OTHER ERROR ssl.SSLError 241537094a833a156b6bb10c78b9e82b ([SSL] PEM lib (_ssl.c:2709))
[07/Dec/2018:23:55:04]    Traceback (most recent call last):
INFO:cherrypy.error:[07/Dec/2018:23:55:04]    Traceback (most recent call last):
INFO:cherrypy.error:[07/Dec/2018:23:55:35]        context.load_cert_chain(cert_file, key_file)
[07/Dec/2018:23:55:35]    SSLError: [SSL] PEM lib (_ssl.c:2709)
INFO:cherrypy.error:[07/Dec/2018:23:55:35]    SSLError: [SSL] PEM lib (_ssl.c:2709)
[07/Dec/2018:23:55:35] kwmcore.cern.ch 127.0.0.1 "GET /reqmgr2/data/about HTTP/1.1" 500 Internal Server Error [data: 338 in 743 out 29575 us ] [auth: OK "" "" ] [ref: "" "ServerMonitor/2.0" ]
INFO:cherrypy.access:[07/Dec/2018:23:55:35] kwmcore.cern.ch 127.0.0.1 "GET /reqmgr2/data/about HTTP/1.1" 500 Internal Server Error [data: 338 in 743 out 29575 us ] [auth: OK "" "" ] [ref: "" "ServerMonitor/2.0" ]

You need to copy a valid proxy to these required certificate locations (here's a simple script for that).

$ cat ~/cert_copy.sh 
export X509_USER_PROXY=/data/user/myproxy.pem
voms-proxy-init -voms cms -hours 96
sudo chmod 660 /data/srv/current/auth/{reqmgr2,workqueue,acdcserver,reqmon,reqmgr2ms}/dmwm-service-{cert,key}.pem
sudo cp /data/auth/dmwm-service-cert.pem /data/srv/current/auth/reqmgr2/dmwm-service-cert.pem 
sudo cp /data/auth/dmwm-service-cert.pem /data/srv/current/auth/workqueue/dmwm-service-cert.pem 
sudo cp /data/auth/dmwm-service-cert.pem /data/srv/current/auth/acdcserver/dmwm-service-cert.pem 
sudo cp /data/auth/dmwm-service-cert.pem /data/srv/current/auth/reqmon/dmwm-service-cert.pem 
sudo cp /data/auth/dmwm-service-cert.pem /data/srv/current/auth/reqmgr2ms/dmwm-service-cert.pem 
sudo cp /data/auth/dmwm-service-key.pem /data/srv/current/auth/reqmgr2/dmwm-service-key.pem
sudo cp /data/auth/dmwm-service-key.pem /data/srv/current/auth/workqueue/dmwm-service-key.pem
sudo cp /data/auth/dmwm-service-key.pem /data/srv/current/auth/acdcserver/dmwm-service-key.pem
sudo cp /data/auth/dmwm-service-key.pem /data/srv/current/auth/reqmon/dmwm-service-key.pem
sudo cp /data/auth/dmwm-service-key.pem /data/srv/current/auth/reqmgr2ms/dmwm-service-key.pem
sudo chmod 440 /data/srv/current/auth/{reqmgr2,workqueue,acdcserver,reqmon,reqmgr2ms}/dmwm-service-{cert,key}.pem
sudo chmod 400 /data/srv/current/auth/{workqueue,reqmgr2ms}/dmwm-service-key.pem
sudo chown _reqmgr2ms:_config /data/srv/current/auth/reqmgr2ms/dmwm-service-key.pem
sudo chown _workqueue:_config /data/srv/current/auth/workqueue/dmwm-service-key.pem

Manage your services via manage script

Here are a few commands that you can use to manage your central services:

(A=/data/cfg/admin; cd /data; $A/InstallDev -s status)
(A=/data/cfg/admin; cd /data; $A/InstallDev -s start)
(A=/data/cfg/admin; cd /data; $A/InstallDev -s stop)

or if you want to perform an action on only one of them, you can execute something like (change reqmgr2 by the service you want to act on):

(A=/data/cfg/admin; cd /data; $A/InstallDev -s stop:reqmgr2)

Remove central services from VM

This step stops all CMS services running on your VM, kills any python application that is left, and remove all CMSWEB related data (make sure to update your dev vm node name in the first line):

([ "$(hostname -f)" = "your-dev-vm.cern.ch" ] || exit;
 echo "Deleting...";
 cd /data/;
 $PWD/cfg/admin/InstallDev -s stop;
 crontab -r;
 killall python;
 cd srv;
 sudo rm -fr current state auth logs HG* enabled .??*)

and this to make sure ServerMonitor is completely shutdown as well:
ps auxww | grep ServerMonitor | awk '{print $2}' | xargs sudo kill -9

Permissions to create requests and create CouchDB documents

In order to be able to create documents in CouchDB, which means:

the same applies to ReqMgr2, which has a slightly wider list though: https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/ReqMgr/DataStructs/DefaultConfig/PERMISSION_BY_REQUEST_TYPE.py#L9

⚠️ **GitHub.com Fallback** ⚠️