ngm_suites_PortingGuide - ACCESS-NRI/accessdev-Trac-archive GitHub Wiki

PageOutline

Suite Porting Guide

These tips are for porting existing suites to run on NCI. You should also check the Rose & Cylc Workflow Design Guidelines

To run on Gadi, suites should contain either a file sites/nci-gadi.rc or directory sites/nci-gadi that contain Cylc settings appropriate for Gadi. Settings in this file will overwrite those in the main suite.rc file, which will normally have at the end of the file {% include "sites/"~SITE~".rc" %} or similar to include the site information.

In general:

  • Be consistent with other site setups in the same suite
  • Prefer simplicity as much as possible over Jinja use
  • The NCI site has the standard name 'nci-gadi'

Standard Variables

To be consistent across jobs try to re-use these variable names when possible:

  • NCI_QUEUE: Queue to submit jobs to
  • NCI_PROJECT: Project to submit jobs to (allow blank to pick up the default project set in ~/.configs/gadi-login.conf)
  • NCI_MEM_GB_PER_CPU: Memory request per CPU, should normally be 4 when running on the normal queue
  • NCI_STORAGE: Extra storage flags the user wants to add, the suite should be able to run without this

HPC Resources

Tasks requiring HPC resources should look like:

[runtime]
    [HOST_HPC](/ACCESS-NRI/accessdev-Trac-archive/wiki/HOST_HPC)
        [[remote](/ACCESS-NRI/accessdev-Trac-archive/wiki/[remote)]
            host = gadi
        [[job](/ACCESS-NRI/accessdev-Trac-archive/wiki/[job)]
            batch system = pbs
        [[directives](/ACCESS-NRI/accessdev-Trac-archive/wiki/[directives)]
            -q = [NCI_QUEUE]
            [ "-P =" ~ NCI_PROJECT if NCI_PROJECT != "" else ""]
            -l ncpus = [1]
            -l mem = [1 * NCI_MEM_GB_PER_CPU]gb
            -l jobfs = 10gb
            -l storage = "gdata/access[ "+" ~ NCI_STORAGE if NCI_STORAGE != "" else ""]"
            -W umask = 0022
  • [remote]
    • host: Can be just set to gadi, it's not necessary to use rose host-select
  • [job]
    • batch system: Either "pbs" to use the queue or "background" to run on the login node (small tasks taking a couple minutes only)
  • [directives]
    • -P: Set only if NCI_PROJECT is defined
    • -l mem: Should normally be 4 GB per cpu
    • -l jobfs: Used for TMPDIR. A few GB is normally fine
    • -l storage: Must be set to access anything other than the current project's /scratch space. The suite should be runnable without needing NCI_STORAGE set
    • -W umask: This makes files created by the pbs job group readable, by default log files for instance are only readable by the person who ran the job

Modules

Environment modules should be set in the task's init-script. For model builds consider using the meta-modules in /g/data/access/ngm/modules, these will load all the required modules to build that specific model and version.

It's good practice to do a module purge before loading any modules, to disable anything added by the user's ~/.bashrc.

    [HOST_HPC](/ACCESS-NRI/accessdev-Trac-archive/wiki/HOST_HPC)
        init-script = """
            module purge
            module use /g/data/access/ngm/modules
            module load build-nemo/4.0.4
            ulimit -s unlimited
            """

opt Files

If values in a rose-app.conf file need to be changed to run at NCI and will not normally need to be altered by users they can be overridden in an 'opt' file.

Overrides done this way should be minimal, as they're hard to notice in the GUI and can be confusing when a user tries to adjust them, as the override will reset.

Opt files are enabled in a Rose app by setting the environment variable ROSE_APP_OPT_CONF_KEYS to a space-separated list of options to enable, or by adding the flag --opt-conf-key=KEY to rose task-run. When running an app Rose will read the opt file from any selected app/APP/opt/rose-app-KEY.conf, and use the values set there instead of those set in rose-app.conf or the GUI.

Suite opt files similarly use environment variable ROSE_SUITE_OPT_CONF_KEYS or flag --opt-conf-key=KEY.

Apps may set ROSE_APP_OPT_CONF_KEYS for their own purposes, e.g. to select model resolution. You can override the root script and add the flag there to avoid conflicts. Putting the key in parentheses means it won't cause an error if the file isn't present

[runtime]
    [root](/ACCESS-NRI/accessdev-Trac-archive/wiki/root)
        script = "rose task-run --verbose --opt-conf-key=(nci-gadi)"