Rose_SharedSuiteControl - ACCESS-NRI/accessdev-Trac-archive GitHub Wiki

Shared suite control

The description assumes a suite owner user1 who wishes to allow user2 suite control. user1 commands are in span(style=color:darkred, red) and user2 in span(style=color:darkblue, blue).

The cylc documentation describes how the server authenticates connections in https://cylc.github.io/doc/built-sphinx-single/index.html#remote-control

The suite itself can define several levels of access,

  • identity - only suite and owner names revealed
  • description - identity plus suite title and description
  • state-totals - identity, description, and task state totals
  • full-read - full read-only access for monitor and GUI
  • shutdown - full read access plus shutdown, but no other control.

with the default being state-totals. This means that anyone can run cylc scan and see information about your suite like

user2@accessdev% cylc scan -o user1 -f

u-aj458 user1@localhost:43076 Title: (no title) Description: (no description) Task state totals: held:5 succeeded:2 19880901T0000Z held:4 succeeded:2 19881001T0000Z held:1

This facility doesn't give any means to distinguish between permissions given to different users.

Every suite has a passphrase in $HOME/cylc-run/SUITE/.service, normally only readable by the suite owner. Possession of this passphrase gives full read and control access to the suite. However file access control lists can be used to allow this for only selected users.

E.g., to give user2 full control over the suite au-aa398, user1 starts the suite (or just installs with rose suite-run -i), then gives read access to the passphrase with

user1@accessdev% setfacl -m u:user2:r ~/cylc-run/au-aa398/.service/passphrase

Then user2 creates a directory and copies the passphrase

user2@accessdev% mkdir -p $HOME/.cylc/auth/[email protected]/au-aa398 user2@accessdev% cp ~user1/cylc-run/au-aa398/.service/passphrase $HOME/.cylc/auth/[email protected]/au-aa398

''' UPDATE. Now that cylc on accessdev uses https communications, do the same for the ssl.cert file in .service. '''

Interacting with the suite also requires knowing the port number which can be obtained from the cylc scan command, e.g.

user2@accessdev% cylc scan -o user1 -n au-aa398 au-aa398 user1@localhost:43051

Then user2 can run all the usual cylc commands on this suite by specifying user and port, e.g.

user2@accessdev% cylc monitor --user=user1 --port=43051 au-aa398 user2@accessdev% gcylc --user=user1 --port=43051 au-aa398 user2@accessdev% cylc stop --user=user1 --port=43051 au-aa398

This just gives control access to the cylc server. Jobs are still submitted to gadi by the original owner, log files still go to the original directory etc.

If the suite is stopped, the port is closed and only the original owner can restart it.

Note that the passphrase changes when the suite is reprocessed by rose (or reregistered by cylc). Stopping and restarting the suite doesn't change it.

Even more control

For some long running suites it may be necessary to allow several users to be able to stop the suite, modify the configuration and restart.

This is possible by making the suite's work and share directories shared. E.g. in suite u-bz479, rose.conf sets these as fixed directories rather than the default /scratch/$PROJECT/$USER. E.g. assuming both user1 and user2 are members of the project a99

root-dir{share}=gadi*=/scratch/a99/user1
root-dir{share/cycle}=gadi*=/scratch/a99/user1
root-dir{work}=gadi*=/scratch/a99/user1

Before running anything, create these directories on gadi and set the ACLs

user1@gadi% mkdir -p /scratch/a99/user1/cylc-run/u-bz479 user1@gadi% setfacl -R -m u:user2:rwx -m d:user2:rwx /scratch/a99/user1/cylc-run/u-bz479 user1@gadi% setfacl -R -m u:user1:rwx -m d:user1:rwx /scratch/a99/user1/cylc-run/u-bz479

Note that it's also necessary to set permissions for the owner in order for them to be able to run again after user2 has made changes.

For the owner the suite runs as normal. After the suite has started to run, give read permission to the passphrase, ssl.cert and db in the suite .service directory

user1@accessdev% setfacl -m u:user2:r ~/cylc-run/u-bz479/.service/passphrase user1@accessdev% setfacl -m u:user2:r ~/cylc-run/u-bz479/.service/ssl.cert user1@accessdev% setfacl -m u:user2:r ~/cylc-run/u-bz479/.service/db

Note that the db is only created at runtime, so this has to be done after the suite has started. Note also that doing

user1@accessdev% setfacl -m u:user2:r ~/cylc-run/u-bz479/.service/*

messes the permissions up completely (perhaps due to the source link to the parent directory?), so you must do it for the individual files.

If the suite needs to be modified, user2 can copy the passphrase and ssl.cert files and stop the suite as in the first section. Then check out the suite and change it as required. After installing the suite and before running, copy the private db file so that the new run knows the current state.

user2@accessdev% rose suite-run -i user2@accessdev% cp ~user1/cylc-run/u-bz479/.service/db ~/cylc-run/u-bz479/.service user2@accessdev% cylc restart u-bz479 (or other command)

Note that user2 is now running their own copy of the suite so it's not necessary to use the user and port options with the cylc commands.

After the suite starts again,

user2@accessdev% setfacl -m u:user1:r ~/cylc-run/u-bz479/.service/passphrase user2@accessdev% setfacl -m u:user1:r ~/cylc-run/u-bz479/.service/ssl.cert user2@accessdev% setfacl -m u:user1:r ~/cylc-run/u-bz479/.service/db

so that the original owner can also take control back.

This can also be done as a modification to an already running suite. The rose-suite.conf file should be modified to explicitly use the paths to the share and work directories that are already being used. Then apply the setfacl commands to the cylc-run directory on gadi and continue as above.