Production Server Info - PanDAWMS/panda-harvester GitHub Wiki

Intro

Information of production harvester servers at CERN are recorded here.

Twiki

Twiki page about central Harvester servers is here.


Specifications of node

  • CERN Openstack VM managed by CSOps
  • OS: CC7

Special setups

High Performance setup

Production nodes use MySQL (MariaDB) as backend and uWSGI to run python (click the links for details).

UPS (Unified Pilot Streaming)

Requirements of UPS.

In harvester.cfg, the key resource_types.json should be contained in [cacher] data :

[cacher]
data =
 ...
 resource_types.json||panda_server:get_resource_types
 ...

In harvester queue configuration file, the following lines are necessary in the queue object:

                "runMode": "slave",
                "mapType": "NoJob",

On AGIS, the PQ must at least have SchedConfig parameters:

Capability: ucore
catchall: ...,Pull,...
pilot manager: Harvester

Also see here for more info.

FIFO

FIFO is currently used on production node on motinor agent cycle and Cache of condor_q.

Global Setup

See here for setup and configuration of global FIFO backend of harvester node.

On Harvester nodes sharing same DB, we use MySQL fifo (can also be Redis fifo, but CERN provides MySQL DB on demand service already) to share the FIFO across harvester nodes as well.

On Harvester nodes with local DB (or remote DB but single node only), we use SQLite fifo with ramdisk for better performance.

Monitor FIFO

Finish Global Setup mentioned above.

And see here for setup and configuration of monitor FIFO.

Cache of condor_q

Cache of condor_q is enabled on production Harvester nodes to reduce condor_q queries and loading on schedd nodes. The cache is implemented with Harvester FIFO.

To enable cache of condor_q, one need to do Global Setup of FIFO first.

Make sure HTCondor system is running well, of course.

Then, set up monitor plugin cache.

Done. HTCondor monitor plugin will work with cache in Harvester FIFO.


Side services

HTCondor

Currently, HTCondor 8.6.11 is installed on CERN production harvesters.

Python Binding

The HTCondor Python binding 8.9.0 is installed on CERN production harvesters with pip:

# pip install --upgrade htcondor==8.9.0

Note that the python binding from condor-all yum package CANNOT work properly in harvester. Thus, pip htcondor is necessary.

NGINX

NGINX (openresty) is running on production node to serve as http gateway with token authentication of Harvester apache messenger.

Installation

  • Yum install openresty-1.13.6.2-1 or above. Yum repo can be found here

  • Get the latest release (v1.0.1) of nginx-jwt from GitHub and untar it a in proper directory (more info)

    wget -P /opt https://github.com/auth0/nginx-jwt/releases/download/v1.0.1/nginx-jwt.tar.gz
    cd /opt/
    mkdir nginx-jwt
    tar -xf nginx-jwt.tar.gz -C nginx-jwt
    
  • Make a secret file for JWT token signature (must be the same file configured as secretFile in frontend section in harvester.cfg)

    ls -l /data/atlpan/harvester_jwt.secret
    
  • Get nginx configuration file in place and make necessary modification. The nginx configuration template can be found here

    mv /usr/local/openresty/nginx/conf/nginx.conf{,.rpmsave}
    vim /usr/local/openresty/nginx/conf/nginx.conf
    
  • Make the script nginx.service in place. The script example can be found here; make necessary modification of variables and paths in the script to fit your environment.

    ls -l /opt/nginx.service
    chmod a+x /opt/nginx.service
    /opt/nginx.service start
    

Service control

One can stop, stop, or reload the nginx service via the following commands respectively:

  /opt/nginx.service start
  /opt/nginx.service stop
  /opt/nginx.service reload

For CERN Central Harvester Instances Only:

CERN CSOps already has a puppet module to build up an instance as central production harvester server.

The harvester instance from CSOps has already done almost all the installation steps. After getting the instance, one can skip the nginx installation steps above, and only need to run this script to initialize:

# /cephfs/atlpan/harvester/scripts/nginx-init.sh

If successful, the instance will run the nginx service binding with port 25443.

After that, one can ask CSOps to open the port to outside CERN.