Work with Middleware - PanDAWMS/panda-harvester GitHub Wiki

Introduction

It is possible to separately run a core-part of harvester on a local host and a peripheral-part on a remote host. This is typically useful to have a central harvester instance for multiple resources which have limited operational policies, e.g., some HPCs where outbound network connections are forbidden, only ssh-access is allowed, and so on. The following diagram shows normal sequence with an agent, plugin, and resource,

where the agent accesses the resource through the plugin. All of them run locally. Note that the resource in the picture essentially represents local API to the resource.

The sequence can be split using an RPC middleware as shown in the next picture,

where MW_Herder is a part of the middleware running locally to manipulate MW_Bot. MW_Bot is also a part of the middleware running remotely to access the resource through the plugin. MW_Herder and MW_Bot communicate with each other though SSH tunnels. The MW_Herder behaves like the plugin from the agent point of view, while the plugin receives function calls from the MW_Bot. So the same agent and plugin can be used both for the normal and core/peripheral use-cases. Moreover, note that in the core/peripheral use-case completely identical software stack is installed locally and remotely, so additional development is not required for remote operation.

There is one caveat: local and remote hosts must have the same major version of python. E.g., it is impossible to have python2 on the local host and python3 on the remote host, or vise-versa, since internal data model is so different between them.

Example at NERSC

This section describes how to configure harvester with the RPC middleware using NERSC/Cori as an example.

Local setup

First you need to locally install harvester and configure cfg/json files as usual. Then, if your queue has in panda_queueconfig.json something like

                "prodSourceLabel": "managed",
                "submitter": {
                        "name": "DummySingletonSubmitter",
                        "module": "pandaharvester.harvestersubmitter.dummy_submitter"
                },
                "messenger": {
                        "name": "SharedFileMessenger",
                        "module": "pandaharvester.harvestermessenger.shared_file_messenger"
                },

you need to add an element with the rpc key and "middleware": "rpc" to agents which require remote access.

                "prodSourceLabel": "managed",
                "rpc": {
                        "name": "RpcHerder",
                        "module": "pandaharvester.harvestermiddleware.rpc_herder",
                        "remoteHost": "cori04",
                        "remoteBindPort": 18861,
                        "numTunnels": 3,
                        "sshUserName": "your_username",
                        "sshPassword": "your_password",
                        "jumpHost": "cori.nersc.gov"
                },
                "submitter": {
                        "name": "DummySingletonSubmitter",
                        "module": "pandaharvester.harvestersubmitter.dummy_submitter",
                        "middleware": "rpc"
                },
                "messenger": {
                        "name": "SharedFileMessenger",
                        "module": "pandaharvester.harvestermessenger.shared_file_messenger",
                        "middleware": "rpc"
                },
                "stager":{
                        "name":"DummyStager",
                        "module":"pandaharvester.harvesterstager.dummy_stager",
                        "bareFunctions": ["trigger_stage_out", "check_stage_out_status"],
                        "middleware": "rpc"
                },

It is possible to set a list of function names to bareFunctions, so that those functions are locally executed using bare plugins, which is typically useful to execute only a few functions on the remote host.

In this example, cori.nersc.gov is used as a jump host to login to cori04. cori.nersc.gov is a LB and cori04 is the remote host where the peripheral-part will run. numTunnels is the number of SSH tunnels to the remote host. At this stage, you can check if the configuration is good to establish SSH tunnels, after setting up test environment as described in the Testing and running section

$ python lib/python*/site-packages/pandaharvester/harvestertest/sshTunnelTest.py

Remote setup

Next, you need to install harvester on the remote host. One of typical situations at HPCs is that only ssh-access is allowed to the remote host, no outbound network connection is allowed from the remote node, and thus pip cannot be used for harvester installation out of the box. For such a situation, harvester can be offline-installed. First you need to manually setup virtual env on the remote host if necessary. Here is an example with conda at NERSC/Cori.

$ ssh remote_host
$ module load python/2.7-anaconda-4.4
$ conda create -p ~/harvester python=2

Note that python2.7 is loaded since python2 is used on the local host in this example. Then you can install harvester to the remote host by running the following command on the local host.

$ python lib/python*/site-packages/pandaharvester/harvesterscripts/remote_install.py --queueName=your_queue_name --remotePythonSetup "module load python/2.7-anaconda-4.4;source activate ~/harvester"

where --remotePythonSetup specifies how to setup python on the remote host. In this example, it loads python2.7 and goes into a virtual environment.

Once harvester is installed to the remote host, you need to make panda_common.cfg and panda_harvester.cfg, and edit the former on the remote host.

$ ssh remote_host
$ cd ~/harvester
$ mv etc/sysconfig/panda_harvester.rpmnew.template  etc/sysconfig/panda_harvester
$ mv etc/panda/panda_common.cfg.rpmnew.template etc/panda/panda_common.cfg

Then change logdir in etc/panda/panda_common.cfg and do mkdir for the logdir if necessary.

Now you can launch a bot on the remote host. Technically it should be possible to launch bots on demand from the local host, but for now you have to do it manually.

$ ssh remote_host
$ module load python/2.7-anaconda-4.4
$ source activate ~/harvester
$ python lib/python*/site-packages/pandaharvester/harvestermiddleware/rpc_bot.py

To stop the bot

$ kill `cat /var/tmp/harvester_rpc.pid`

Testing and running

There is noting special to test or run. Just follow instructions in this section.

⚠️ **GitHub.com Fallback** ⚠️