Rose_Rose Stem UM - ACCESS-NRI/accessdev-Trac-archive GitHub Wiki

NCI implementation of the UM rose stem tests

Since vn10.0 there have been NCI versions of some of the UM rose stem tests, e.g. https://code.metoffice.gov.uk/trac/um/wiki/StandardJobs. These tests all use identical configurations to the Met Office tests except that for some of the N48 tests small processor decompositions like 2x2 have been converted to use a full node.

All tests use Met Office input files mirrored from jasmin to /g/data/access/TIDS/UM/inputs. Results are compared against locally generated known good output (KGO) in /g/data/access/KGO/standard_jobs/rose/. Note that because the KGO comparison tests for exact matches we can't use Met Office results. The process for updating these still needs to be streamlined (#181)

At 10.5 the tests include (see rose-stem/site/nci/graph-standard.rc)

nci_n96_amip_eg
nci_n96_amip_eg_drhook
nci_ukca_eg_strattrop
nci_n48_eg_omp_noios
nci_n48_eg_omp_ios
nci_n48_eg_noomp
nci_n48_ga7_amip_2day
nci_n512_eg (uses IOS)
nci_scm_togacoare_ga6
nci_scm_gabls3_ga6
nci_global_to_lam_eg (runs SEUKV from N512)

Running the tests

Check out the UM trunk or a branch and cd to the top level directory. Then

rose stem --group=developer

There is also an optional runtime variable NCI_EXPRESS_LIMIT (default 16). Jobs using <= this number of cores run on the express queue. For example, use

rose stem --group=nightly -S NCI_EXPRESS_LIMIT=32

if you want this group including the 32 core ga7_amip jobs to all run in the express queue. The available groups are defined in rose-stem/site/nci/graph-group.rc. We're in the process of setting up a more appropriate nci_developer group of tests https://code.metoffice.gov.uk/trac/um/ticket/1889.

Compiler and MPI versions

The test configuration in the trunk uses intel-fc/15.0.1.133 and openmpi/1.8.5. The tests use a prebuilt gcom library so the appropriate module must be loaded (rose-stem/site/nci/family.rc).

However tests with intel-mpi also work and give identical results.

Automated testing

Scott has set up automatic nightly and weekly runs of the appropriate task groups using Jenkins.

Results are also available via rosebush at https://accessdev.nci.org.au/rose-bush/suites/accesstester.

Management

Since the model runs are machine dependent we need to generate our own “known good output” (KGO) rather than using Met Office output. The weakness in our UM rose stem setup is what happens when there’s a code change that’s intended to change the results. At the Met Office the person committing the code change to the trunk has the responsibility of updating the KGOs and the rose-stem variables to point to the new directories, e.g. https://code.metoffice.gov.uk/trac/um/browser/main/trunk/rose-stem/site/meto/variables.rc.

The NCI versions of the variables.rc file (https://code.metoffice.gov.uk/trac/um/browser/main/trunk/rose-stem/site/nci/variables.rc) tracks the Met Office variables so when these change our comparisons fail because the new KGO directory is missing.

There’s a script admin/rose-stem/kgo_update.py in the UM source tree that automates this at the Met Office by copying fields from the latest run to the KGO directories. It doesn’t work here because the checking fails because it’s trying to use a directory that’s not present. I’ve been intending to come up with a better way to do it here, but haven’t got around to it yet. At the moment I copy the new output manually. Changes aren’t that frequent so this hasn’t been too annoying yet, though it did get a bit hectic just before the vn10.6 release. Ideally we’d do some extra checking that the size of the difference in the results here was similar to the difference at the Met Office, but I’ve only ever done this in a couple of special cases.

Some discussion https://code.metoffice.gov.uk/trac/um/ticket/1828

Cost estimates of the nightly and weekly tests at https://code.metoffice.gov.uk/trac/um/wiki/NCI_StandardJobs10.6/Resources