Admin Guides_UpdatingRoseCylc - ACCESS-NRI/accessdev-Trac-archive GitHub Wiki

Updating rose, cylc and fcm=

On raijin

cylc

% newgrp access.admin
% cd ~access/apps/cylc

Update the git repository:

% cd git
% git pull

Check what tags are available

% git tag

Clone the local repository for the new version (e.g. if the latest tag is '6.7.0'):

% cd ..
% git clone git 6.7.0

Set the new version to the correct tag

% cd 6.7.0
% git checkout 6.7.0

Create file etc/job-init-env.sh as follows with the rose version set appropriately (see #189).

. /etc/profile
. ~/.profile
module use ~access/modules
module load cylc/wrapper
# Load matching rose module for this version of cylc
module load rose/2015.08.0
module load fcm

Note that no version specific cylc module is required.

See https://accessdev.nci.org.au/trac/ticket/199#comment:7 for some other minor changes required with cylc versions 6.7.0 and later.

On accessdev, in lib/cylc/scheduler.py set

    INTERVAL_STOP_KILL = 60.0

as a workaround for problems reported in https://track.nci.org.au/browse/CWSHELP-149.

rose

Procedure for rose is basically the same

% newgrp access.admin
% cd ~access/apps/rose

Update the git repository:

% cd git
% git pull

Check what tags are available

% git tag

Clone the local repository for the new version (e.g. if the latest tag is '2015.08.0'):

% cd ..
% git clone git 2015.08.0

Set the new version to the correct tag

% cd 2015.08.0
% git checkout 2015.08.0

Create appropriate module in ~access/modules/rose. Note that new versions may be installed on raijin independent of accessdev, as long as the default rose module isn't changed.

Create VERSION/etc/rose.conf with

[rose-mpi-launch]
launcher-list=mpirun
launcher-preopts.mpirun=-n $NPROC

Note that the UK run scripts set NPROC, though PBS sets NCPUS. With openmp NPROC may be different to NCPUS.

Update ~access/apps/rose/bin/rose with check for appropriate matching versions.

fcm

git checkout as above.

On raijin create VERSION/etc/fcm/keyword.cfg by copying the previous version of this file. Note that some values in this are different to the file on accessdev because of the on disk mirrors.

The accessdev keyword.cfg is created automatically by the puppet installation.

Create appropriate module in ~access/modules/fcm.

Puppet configuration for accessdev

hierdata/project.yaml has

# Rose version
rose::install::default: 2015.04.1
rose::install::versions:
  - 2014-05
  - 2015.02.0
  - 2015.04.1

and

# Cylc version
cylc::default_version: 6.4.1
cylc::install_versions: 
  - 5.4.14
  - 6.3.0
  - 6.4.1

Modify this to add the new versions and update defaults. Older versions should not be removed as there may be running suites depending on them.

For fcm, just update the version (only one version is installed on accessdev because it's backwards compatible). E.g.

fcm::version: 2015.09.0

Testing

New puppet configuration should be added on a branch and tested on accessdev-test. A ticket should be created to record the testing.

The suite au-aa398 is a simple test cycling suite that runs background and PBS jobs on raijin and background jobs on accessdev. On accessdev-test, rosie doesn't see the accessdev database by default, so to checkout the suite run the command

svn checkout -q svn+ssh://accessdev.nci.org.au/home/access-svn/roses_au_svn/a/a/3/9/8/trunk@HEAD $HOME/roses/au-aa398

Then

cd roses/au-aa398
rose suite-run

After successful completion check that CYLC_VERSION is set correctly in ~/cylc-run/au-aa398/log/job/20000601T0000Z/model/01/job and that rose --version has returned the correct version in ~/cylc-run/au-aa398/log/job/20000601T0000Z/model/01/job.out.

Suite au-aa123 builds GCOM and runs a simple test. This tests whether the rose mpi-launch configuration is set correctly. Check that mpl_test has actually used the requested 2 processors.

The UM rose-stem tests provide a more complete test. To run this

fcm co fcm:um.xm/trunk
cd trunk
rose stem --group=developer

Note that any failures in the rose-ana comparison tasks don't indicate a problem with the rose/cylc installation but rather that the model KGOs haven't been updated.

Also check rose-bush at https://accessdev-test.nci.org.au/rose-bush/. Check that you can see job.out from a suite.

Also check that the previous versions of rose and cylc still work by running a suite with something like

CYLC_VERSION=6.4.1 ROSE_VERSION=2015.04.1 rose suite-run

A few days before installing and updating default modules on raijin send a message to access_users informing them of the upcoming change, including a pointer to https://accessdev.nci.org.au/trac/wiki/access/RoseCylcVersions. Also update the version table on that page.