access_FujitsuUMMigration - ACCESS-NRI/accessdev-Trac-archive GitHub Wiki

PageOutline

Migrating UM codes to the new NCI Fujitsu system is presently being tested. The following notes neccessary adjustments to the new system.

Tested jobs are available under the UMUI folder uakn

Bold text below indicates features yet to be implemented

Currently Raijin access is only available using a temporary UMUI, run ~saw562/umui/umui2.0/bin/umui_saw on accesscollab (this is not guaranteed to be stable & only supports UM7.3)

ACCESS 1.3 / UM7.3

Job uakb contains tests for migrating a UM job to the Fujitsu

Utility Libraries

Libraries required to run the UM are available as modules on Raijin. These are automatically added by the UMUI when running jobs on Raijin.

To manually use the modules run

module use ~access/modules

You can see what modules are available by running 'module avail', modules we have installed already include

  • fcm
  • gcom
  • rose
  • cylc
  • oasis3 TODO
  • oasis3-mct TODO

Please let the help team know via [email protected] if you would like other libraries set up.

There is also a special 'UM' module that sets up $DATAOUTPUT and $UMDIR automatically for you according to your current NCI project.

Module versions loaded by the UMUI for 7.3 jobs are:

intel-cc/13.2.146 - currently testing ifort 13, may have problems
intel-fc/13.2.146
openmpi/1.6.3
netcdf/4.2.1.1
gcom/3.3
oasis3/?? - included for completeness, AMIP doesn't call oasis but there are references to it in the code
fcm
um

Branches

The branch settings to build the UM are found in the UMUI panel 'FCM Configuration->FCM Options for Atmosphere'.

You will need to tell the build system what compiler settings to use on Raijin. Do this by setting:

  • Set UM_SVN_BIND to fcm:um_dev/Share/VN7.3_local_changes/src/configs/bindings
  • Set UM_CONTAINER to $UM_SVN_BIND/container.cfg@HEAD

Once this is done set up the branches, depending on if you are using an ACCESS model or plain UM. Make sure you've set up a prebuild (see below), there are prebuilds for ACCESS 1.0, ACCESS 1.3 as well as a generic vn7.3 prebuild available.

UM7.3 (excluding ACCESS)

In order for the code changes needed on Raijin to be picked up by your jobs the following branch structure must be used:

  • Base Branch: fcm:um_tr, using default revision (vn7.3)
  • Branches:
  • fcm:um_dev/Share/VN7.3_local_changes revision HEAD
  • Any user branches

ACCESS

ACCESS based models can't use this format - there was a code reformat in its past and due to the way FCM merges work it produces merge conflicts when combined with local_changes and trunk. Instead these should use the format

  • Base Branch: fcm:um_br/pkg/Rel/ACCESS1.3, using revision 4896
  • Branches:
  • fcm:um_dev/Share/VN7.3/access1.3_local_changes/src
  • Any ACCESS user branches

Other UMUI modifications

You will of course have to set the Target Machine to 'raijin'. You should also set the username and project of the job to the exact strings '$USER' and '$PROJECT'. This means the correct values get automatically picked up from the environment so that if someone else wants to use your job they don't have to change these settings again.

You should also check the hand edit and override files. Some of these introduce platform specific settings which will cause errors when you try and run on Raijin. Things to watch out for are explicit paths (e.g. /data/projects/access doesn't exist on Raijin) and library versions that may not have been installed on the new machine.

If you get build errors check these files, you can also ask for help by emailing the helpdesk [email protected] (climate_help for CoE users).

A major difference between Vayu/Solar and Raijin is where ancillary data files are stored. Previously they were on a separate file system to the ~access directory, now the two filesystems have been combined. Ancillary and data files now reside in /projects/access/data (instead of /data/projects/access). You can easily convert a job to the new filesystem by adding the hand edit '~access/raijin/data-paths.sh', though if possible you should change the paths in the UMUI since this will make it easier to find the files you're using.

Other

  • CABLE has a hardcoded path to one of its ancils in the ACCESS1.3 source - this is fixed by the 'local_changes' branch
  • Point TMPDIR to jobfs so that it's automatically cleared - will require syncronising the nodes before running the UM, since the UM expects a shared TMPDIR

UM8.2

8.2 will be the second supported UM version on raijin, to be set up once ACCESS is functional

Prebuilds

Prebuilds allow UM build jobs to use the results of a previous build, meaning that only files that have been changed or are affected by science section changes need to be rebuilt.

For all jobs on Raijin you should use the settings:

  • UM_PREBUILD: ~access/prebuilds
  • UM_REM_PREBUILD: $UMDIR/prebuilds

You can in many cases just use a generic prebuild name - these are named after the model name & the build optimisation level, e.g. vn7.3_safe, vn8.2_debug. The optimisation level in the prebuild name should match that of your own job.

The following prebuilds are currently available on Raijin:

  • vn7.3_safe
  • vn7.3_access1.3_safe

Please let the help team know if other configurations would be useful.

Reproducability

Check the final abs values by running on the output file:

grep -i 'final abs' | head

All values should match the results given here if they are properly reproducable

ACCESS 1.3 N96 AMIP

Raijin

  Final Absolute Norm :   7.276954701768109E-003
  Final Absolute Norm :   4.715066763113245E-003
  Final Absolute Norm :   9.434324436544766E-003
  Final Absolute Norm :   8.119724668540855E-003
  Final Absolute Norm :   8.754816579023663E-003
  Final Absolute Norm :   9.709622368623301E-003
  Final Absolute Norm :   8.431032766056844E-003
  Final Absolute Norm :   8.862476886181806E-003
  Final Absolute Norm :   9.746810549474134E-003
  Final Absolute Norm :   9.506179859159086E-003

Benchmarks

ACCESS 1.3 N96 AMIP

Raijin Build

  • 4 cpu
  • 00:13 walltime
  • 00:19 cputime
  • 600 mb memory
  • 0.88 SU

Raijin Run (3 month NRUN)

  • 128 cpu uakoa (8ew x 16 ns)

  • 00:43 walltime

  • 90:30 cputime

  • 23 gb memory

  • 91.48 SU

  • 128 cpu (16ew x 8 ns)

  • Very similar results

  • 64 cpu uakob (8ew x 8 ns)

  • 1:04 walltime

  • 68:22 cputime

  • 14 gb memory

  • 68.8 SU

  • 192 cpu uakoc (24 ew x 8 ns)

  • 00:44 walltime

  • 145:05 cputime

  • 37gb memory

  • 143.15 SU

Vayu Build

  • 4 cpu
  • 00:21 walltime
  • 00:39 cputime
  • 1400 mb memory
  • 1.45 SU

Vayu Run

  • 128 cpu saaqb (8ew x 16 ns)
  • Scott Wales, May-June 2013