access_UMFiles - ACCESS-NRI/accessdev-Trac-archive GitHub Wiki


#!html
<h1 style="text-align: center; color: blue">
Documentation of UM file utilities
</h1>


The UM fields file format (https://nf.nci.org.au/facilities/software/UM/7.8/umdoc_system/UM_docs/papers/pdf/p0f3.pdf) is reasonably straightforward to read, though difficult to write from scratch because of the huge number of header variables. Most utilities described here only require numpy but some that read/write netCDF files also require cdms2 (from cdat-lite). On raijin use

module use ~access/modules
module load pythonlib/umfile_utils

and if necessary

module load cdat-lite

All files are in raijin:~access/apps/pythonlib/umfile_utils and in the svn repository https://trac.nci.org.au/svn/access_tools/umfile_utils.

Utilities

Editing land sea mask

mask_edit.py is a GUI editor for land-sea masks, allowing flipping selected points from land to sea or vice-versa. Creates a list of changes that can be used by the CAP.

For more details see AncillaryFileModification

Getting the number of land points from a mask file

python count_land.py maskfile

returns the number of land points in a UM land mask, as required for the reconfiguration to work properly. E.g.,

python count_land.py /data/projects/access/HG2AO_ancils/qrparm.mask
10519

Extract a sub-region from an ancillary file

python subset_ancillary.py -i ifile -o ofile -x x0,nx -y y0,ny

The new region is specified by the indices of the lower left corner (x0,y0) and the extents nx, ny. Note that the indices are python style starting from 0 rather than fortran style starting from 1. The intended usage is selecting a region for the TC model from a larger ancillary file covering the whole possible region. Note that this hasn't been tested with a rotated grid and is unlikely to work in that case.

E.g.,

python subset_ancillary.py -i ~access/data/HG2AO_ancils/qrparm.orog_new -o out -x 48,49 -y 32,41

selects an Australian region, 90-180E, 50-0S.

subset_dump.py works similar to subset a LAM dump file as an alternate way of setting up a region on demand.

The reconfiguration can give incorrect results (e.g. zero or missing soil temperature) if there are land points in the target grid but none in the matching region on the parent grid, (even if there is land elsewhere). This has occurred with the relocatable tropical cyclone system. check_land_overlap.py compares the TC land mask with the global land mask to warn of potential problems.

python check_land_overlap.py global_mask regional_mask

show_land_overlap.py produces a map of the problem corner points.

For more details see http://ngamai04.bom.gov.au:8011/APS1/wiki/TCReconfig (BOM internal)

Extract or remove selected fields from a PP file

To extract selected variables

python um_fields_subset.py -i ifile -o ofile -v var1,var2,...

To exclude selected variables

python um_fields_subset.py -i ifile -o ofile -x var1,var2,...

The variables are specified by a STASH index = Section Number * 1000 + item number.

The -p option extracts only prognostic variables (sections 0,33,34) and so gives a minimal initial dump.

Note that Alan Iwi of BADC has a utility that does a similar job (http://home.badc.rl.ac.uk/iwi/um/utils.html#subset), though it's a bit less convenient because it prompts for each field individually. This is in ~access/bin.

Change an ancillary file calendar

Even time invariant files like the orography include a calendar flag and the model won't run if it's incorrect (note that this is fixed in vn8.X). change_calendar.py changes a file from Gregorian to 360 day calendar and change_calendar365.py changes to a Gregorian calendar. This also works on files where the fields are time dependent (e.g. AMIP SSTs). In that case it doesn't do any time processing of the data, just changes the calendar code and date fields.

Version 8.? of the UM introduced an allowable missing value for the calendar which means that time independent and monthly mean files can be used in both 360 day and Gregorian calendar runs. For example the GA6 ancillaries in ~access/data/ancil/GA6_N96 use this. Note that xconv will show these files to use the 360 day calendar. To determine the calendar type use get_calendar.py.

Change the stash code of a selected field

This may be useful for fixing up the codes of user STASH fields. It was originally written handle the change of the biogenic aerosol field from code 321 to 351. To fix this so the new model works from an old dump

python change_stashcode.py -v 321,351 dumpfile

Dump header and field information

python um_fieldsfile_dump.py [-h] [-s] file

Dumps information somewhat like pumf. -h gives only header, -s gives only field summary.

lbcdump.py does the same for an LBC file (all levels are packed in a single record here so it doesn't give per level information).

Change date in a dump file

Change the initial and valid date of a UM dump file

python change_dump_date.py file

and enter desired date as prompted.

Interpolate ancillary file to a different grid

python interpolate_ancillary -i ifile -o ofile -m landseamask

Interpolate all fields in ifile to the grid defined by the maskfile

Merge files

python mergefiles.py file1 file2 outfile

combines fields from file1 and file2 into outfile. Basic header information is taken from the first file. Any fields present in both files are taken from the first file.

Merge a subregion

python mergefiles_region.py -x x0,nx -y y0,ny file1 file2 outfile

Merges the specified region (defined as for subset_ancillary) from the field in file1 to the field in file2 Merge region is specified by the indices of the lower left corner (x0,y0) and the extents nx, ny. Files must have exactly same fields, grids etc.

Alter values of a particular field

um_modify_field.py

python um_modify_field.py -a scale -b offset -v var [-v var2 ...] file

Replace a specified field by new = scale*old + offset. Can specify multiple fields, but all use same offset and scale factor. Using -a 0 gives equivalent of um_zero_field.py.

um_replace_field.py replaces a field with values from a netcdf file. This only works for single level fields at the moment. NetCDF missing values are handled and should appear as missing values in the UM file.

um_zero_field.py sets specified list of fields to zero.

um_copy_field.py copies a given list of fields from one file to another (the fields must also be present in the destination). This just copies the data with none of the header information. May be useful for perturbation experiments of various kinds.

perturbIC.py applies a spatially varying random perturbation to the theta field of a dump file (constant in vertical so as not to upset vertical stability). This can be useful to get past climate model crashes. Amplitude can be specified by an argument (default 0.01). See UMResubmission for rationale and details

Polar anomalies

The UM grid has fields defined at the poles. The values in this row should always be equal but the model seems only to enforce equality within a processor. There have been instances of anomalies developing in the polar rows, sometimes from ancillaries and sometimes apparently spontaneously. These may cause the model to crash, especially if the processor decomposition is changed. The script polar_anom.py checks for these anomalies and fix_polar_anom.py replaces any anomalies by the zonal mean.

Converting to netCDF

Climate model output

um2netcdf.py converts UM climate files to netCDF. Monthly means can be concatenated into a single netCDF file. Also works with daily files (with optional monthly mean).

um2netcdf.py -s section,item -i input -o output.nc

where section and item are the appropriate stash codes for the desired variable.

To convert all the variables in a file use ~access/bin/um2netcdf.py (to create netCDF3 classic files) or ~access/bin/um2netcdf4.py (to create netCDF4 classic files). The netCDF4 version allows for larger files and should be preferred unless you really need netCDF3 classic files for some old program. These utilities are part of the CMIP5 post-processor https://trac.nci.org.au/svn/access_tools/post_processor/

UM timeseries output

The UM can save point timeseries in the STASH. The format of this is quite different to normal STASH and more general tools can't do anything with it (e.g. xconv). um_timeseries.py converts the file to netCDF. At the moment it only works for single point timeseries, not for regions.

Variable grid model

umv2netcdf.py converts variable and/or rotated grid files to netCDF.

Modules

um_fileheaders.py

Field names of all the headers in a UM file. From https://access-svn.nci.org.au/trac/um/browser/trunk/src/utility/qxreconf/rcf_headaddress_mod.F90. Subtracted 1 from all values so they work with python arrays.

umfile.py

Defines a class with routines for reading and writing UM files. Used by almost everything else. Works with big and little endian, 32 and 64 bit files.

stashvar.py

Dictionary with UM STASH variable names indexed by section and item numbers (taken from STASHmaster file). Where available CF standard names and CMIP5 short names are also included.

levelheights.py

Calculate true heights of model levels above sea-level, depending on orography

eqtoll.py

Rotated grid calculations

Obscure stuff

um_grid_flip.py Flip an ancillary file grid to run S-N (very old ones were the other way around).

change_endianness.py Change a file from big_endian to little_endian

Other utilities

The NCAS CMS PP file utilities http://cms.ncas.ac.uk/wiki/ToolsAndUtilities/PpFileTools are installed as module ncas-utils.

⚠️ **GitHub.com Fallback** ⚠️