access_AccessDevMigration - ACCESS-NRI/accessdev-Trac-archive GitHub Wiki

Accesscollab to Accessdev Migration

Authors: Scott Wales, CoE;

Goals

The goal of the migration from accesscollab to accessdev is to provide an updated environment that is easier to manage for support personel, as well as to provide updated UIs needed for new versions of the Unified Model (UM). Our primary concern is that this migration does not negatively affect researchers currently using NCI systems.

Current Situation

Currently UM models are managed and configured on the machine accesscollab, a virtual machine hosted at NCI. UM models also make use of the virtual machine access-svn, which provides model code. Models themselves are compiled and run on the NCI supercomputer (presently vayu, this will be replaced in 2013 as part of NCI's upgrades).

When a researcher wishes to run a UM model they use a program called umuix to locate a configuration in an experiment database, configure it to their liking and submit the job. The submission process downloads code from access-svn to accesscollab, a code merging process occurs on accesscollab then the code is rsync'd across to the supercomputer, along with job control scripts produced as part of the configuration. The job control scripts are submitted to the supercomputer's job queue, these scripts compile the model code according to the desired experiment configuration then run the model proper.

Accessdev

accessdev is a new virtual machine hosted by the Nectar cloud at NCI. It provides an updated operating system as well as a management system called Puppet from which operators can modify the system configuration.

accessdev will be configured to support:

  • umuix program
  • UM experiment database
  • umuix is able to access both local and remote databases, access to the accesscollab database will be beneficial during the transition
  • Central & user-provided scripts (handedits)
  • Central scripts will be version controlled and handled by Puppet, user scripts will be managed by users and reside in their home directories
  • Connections to access-svn and vayu
  • Security needs to be managed - passphrase secured ssh-agents used where possible
  • Recoverable backups of user home directories & experiment database

Additionally the user home directories and experiment database of accesscollab should be available from accessdev, ideally these will be rsync'd at a transition date. To avoid conflicts between the accesscollab and accessdev experiment databases a new default prefix may be warrented (currently the prefix for accesscollab jobs run on vayu is 'u'). Provided this doesn't cause conflicts with other installations a prefix of 'r' for accessdev jobs run on the new supercomputer is suggested.

The configuration of the future model interfaces (rose and cylc) is outside the scope of this document.

Testing process

Once the migration is complete, any user in the access NCI group will be able to submit and compile the ACCESS standard runs with the procedure:

  1. Login to accessdev
  2. Start umuix
  3. Locate experiment in the database & open
  4. Process by pressing 'Process' button
  5. Submit by pressing 'Submit' button

No modification of the job configuration should be required. If any initial setup is needed this should occur automatically the first time a user logs on to the system.

ACCESS standard runs to be tested include:

  • saaqa: ACCESS 1.0 AMIP
  • saaqb: ACCESS 1.3 AMIP
  • saaqn: ACCESS 1.3 AMIP, n48 resolution
  • saatd: UM 8.2 external release test
  • sabjh: ACCESS 1.3 AMIP coupled to KPP mixed-layer ocean

Additionally, any existing user of accesscollab will be able to run their own jobs using the same procedure they use on accesscollab, provided they don't make use of files from user directories other than their own.

A test is successful if when following the above procedure running the model on accessdev produces an exact match to the results obtained running it on accesscollab, and if the performance of the accessdev system is an improvement on that of accesscollab.

Testing will be performed by support staff and users. Once the support staff are confident that the system is performing adequately it will be handed over to users to test on their configurations. In the first instance these will be PhD students and postdocs from within the CoE who are comfortable using our current umuix.

Consequence of Success

If testing is successful accesscollab will be considered deprecated, and users encouraged to move to accessdev. An ideal transition point for this is the installation of the new supercomputer at NCI, if this is only accessable from accessdev this will provide strong encouragement for users to migrate.

Once accesscollab is decommissioned user home directories will be archived for a minimum of 1 year, so that files can be retrieved in the event of users forgetting to do this before the transition.

Consequence of Failure

If the accessdev system fails to either be available before the release of the new supercomputer or is otherwise unsuitable accesscollab will continue to be supported until accessdev passes the testing requirements, with support for the new supercomputer added to the accesscollab umuix.