Home - optclim/ModelOptimisation GitHub Wiki

Welcome to the OptClim2 wiki! This was written in June 2022 for the ARCHER2 branch.

What is OptClim?

OptClim is a software framework that uses a cyclic workflow to optimise parameters in models by generating configurations (parameter sets) of models, running them and then comparing model results against observations.

overview1JPG

Models that include parametrised processes are commonplace. Among examples in atmospheric modelling are those representing cloud processes – their formation and processes concerning both rain and reflection/radiation. Parameters have ranges of possible values, and with OptClim these parameters can be tuned so models better represent past observations, and so can be projected forward with more confidence – e.g. to understand the impacts of the increasing CO2 concentrations on climate.

OptClim runs require the researcher to set:

  • Parameters
    • the definition of parameters, each with range of allowed values and default values (4 or 5 is typical)
    • the location of each parameter in the namelists of the model, to permit it to be edited. Currently this information has to be added to the OptClim code specific to the model. There is also the concept of a "metaparameter" whereby one user-defined parameter can cause an array, or multiple other dependent namelist parameters to be set.
  • observations
    • These are single numbers, such as global averaged outgoing shortwave radiation from satellites. Typically one more than the number of parameters being optimised. These will each have a target value, against which the model's simulated observables are compared.
    • user-provided code is needed to generate simulated observations from the model results
  • optimisation method
    • selection of a supported optimisation method, currently DFOLS is generally used in preference to the supported alternative of Gauss-Newton. This sets parameter values for the runs to be orchestrated by OptClim. (To be documented/linked: guidance on use of DFOLS)
  • names and filepaths needed by OptClim - e.g. where to generate/find files.

These specifications are all held in a JSON file used by the OptClim software. A study - the workflow and model instances set up by OptCim - is generated in the directory holding the JSON file. The JSON file includes:

JSON element purpose
Name Name of the study (directory to be created in the directory of the JSON file)
baseRunID This is a _prefix for the directory of a run, e.g. if yd, then runs are in directories of form studyDir/yd001, yd002... These directories hold data needed to connect OptClim to each run, in particular for the generation of simulated observables
runCode the Archer2 budget code agains which jobs are accounted
machineName Must be "slurm" for Archer2
modelName one of MITgcm, CESM, UKESM
study.referenceModelDirectory the model directory that is cloned (Note - not used for UKESM)
optimise.dfols settings for the dfols optimiser
Parameters define each parameter: range of values, initial value to be used by OptClim)
postProcess specify the code used for generating simobs from model outputs
targets the target values of the simulated observations
simulatedObservations names and associated data for the simulated observations

The OptClim2 software provides the following:

  1. Run the optimiser script that assimilates all results to date and determines the next set of models to be run, with their corresponding parameter values.
  2. Queue an array of sequential jobs, one task for each of the models to be run. These are in state “held”
  3. Queue a job to await completion of all the array tasks, to run the optimiser script again.
  4. Clone each model and modify their parameters
  5. Start each model, the model scripts including a “release command” for its array task, this being run on completion of the model. For each cloning, a “base model” is replicated – one already tested and amended so it interfaces with OptClim as described below.

A glossary of terms used is in a separate wiki page.

History of OptClim

The initial prototype of OptClim, termed OptClim1 was developed and used on the University of Edinburgh cluster Eddie in a collaboration between Prof. Simon Tett, Prof. Coralia Cartis and Dr Mike Mineter.

OptClim2 was coded by Prof. Tett. This release modified some functionality and reimplemented some of the bash scripts of OptClim1 to use more Python.

The Archer2 branch of GitHub was developed with support from eCSE to port OptClim2 to Archer2 with minimum amendments to code. The changes made were to add:

  1. Using Slurm in job management, and
  2. Extensions to support the MITgcm, CESM2 and UKESM models.

For guidance on installation see https://github.com/optclim/ModelOptimisation/wiki/Installing-OptClim

Documentation specific to each supported model

A separate wiki page exists for each supported model. An example study for each exists on Archer2.

Support

It is expected that first-time users will require some support and guidance - to help finalise their plans as well as with initial configuration. This is on a best-efforts basis, and can be requested via a list called optclim_developers in the email domain mlist.is.ed.ac.uk.

Extensions made to the code for Archer2

The ARCHER2 eCSE project developed these extensions as described in https://www.archer2.ac.uk/ecse/reports/ARCHER2-eCSE04-07-technical-report.pdf

Module Amendments
runAlgorithm.py added import of each new model's class - it imports all valid models; no other change
UKESM.py Class for UKESM
UKESM additional software described in other wiki pages to use Rose/Cylc
MITgcm.py Class for models using MITgcm
CESM.py Class for CESM2
config.py Specific functions for SLURM