access_UMDataFiles - ACCESS-NRI/accessdev-Trac-archive GitHub Wiki

UM File Format and how to work with it

The UM uses its own binary file format for model initial conditions, diagnostic output and ancillary files. This is described in UMDP F3: Format of Atmospheric, Oceanic And Wave Dumps, Fieldsfiles, Ancillary Data Sets, Boundary Data Sets and Observation Files for the forecast.

Although this is a binary format it's reasonably portable and there are utilities to read these files on a range of machines.

There is a range of packing options settable from the FILES follow on in the UMUI Stash -> Edit Domain -> usage panel. FIles can use either GRIB or WGDOS packing. With WGDOS packing each field is packed to maintain a certain precision, rather than to fit in a certain number of bits.

Option 0 (no packing) gives 64 bit output
Option 1 (operational)
Option 2 (climate packing)
Option 5 (new climate packing)

The precision for each option is defined in the STASHMaster files (e.g., $UDMIR/vn6.3/ctldata/STASHmaster/STASHmaster_A for the atmospheric fields). See appendix 3 of https://nf.nci.org.au/facilities/software/UM/8.2/umdoc_system/UM_docs/papers/pdf/p0c4.pdf for a description of the STASHmaster file format.

The 4th line of each entry has the format

#|DataT |DumpP | PC1  PC2  PC3  PC4  PC5  PC6  PC7  PC8  PC9  PCA |

E.g. for surface temperature

1|    1 |    0 |   24 |SURFACE TEMPERATURE AFTER TIMESTEP  |
...
4|    1 |    2 |  -3  -10   -3   -3  -10   21  -99  -99  -99  -99 |

For negative values, the precision for each case is 2n where n is the value in the table. For positive values, n is the number of bits used. This is always specified for GRIB packing (option 6), but some fields also specify 32 bit packing. Temperature uses a precision of 2-3 # 0.125 for PC1 and 2**-10 0.001 for climate packing. In this case options 2 and 5 have the same precision but for a number of variables option 5 has higher precision. Note that -99 means full available precision (64 bit fields here). Some fields have precision specified as

It seems that integer fields (e.g. CONV CLOUD BASE LEVEL NO or CONV CLOUD TOP LEVEL NO) are only written correctly with option 0. Otherwise they just seem to be zero.

The UM utility pumf dumps STASH file headers and some sample field information. The program xconv (in vayu:~access/bin) can do simple plots of UM output (PP files) itself and can also convert to netcdf (though this is not CF compliant).

xconv is more flexible and can work with both 32 and 64 bit, little or bigendian files. The UM itself and its utilities are built for a particular format (64 bit big-endian for the SX6).

There is also a batch version of xconv called convsh. These are documented in https://nf.nci.org.au/facilities/software/UM/8.2/umdoc_system/utils/xconv/index.htm.

Note that the UM uses a staggered grid and so U and V and T are all on different grids. The netcdf file from xconv is something like

        float u(t, hybrid_ht, latitude, longitude) ;
        float v(t, hybrid_ht, latitude_1, longitude_1) ;
        float theta(t, hybrid_ht_1, latitude, longitude_1) ;

where

 longitude = 1.875, 5.625, 9.375, 13.125, 16.875, 20.625, 24.375, 28.125,
 latitude = -90, -87.5, -85, -82.5, -80, -77.5, -75, -72.5, -70, -67.5, -65,
 longitude_1 = 0, 3.75, 7.5, 11.25, 15, 18.75, 22.5, 26.25, 30, 33.75, 37.5,
 latitude_1 = -88.75, -86.25, -83.75, -81.25, -78.75, -76.25, -73.75, -71.25,

xconv includes interpolation so it's possible to use to interpolate fields to the same horizontal grid.

Ferret plots fields from this file ok, using the appropriate grid for each variable.

Grads doesn't work properly with a file with multiple grids (netcdf file using sfdopen). Only variables on the same grid as the first variable in the file seem to work properly.

VCDAT also plots the variables on their appropriate grids.

xconv limitations

xconv seems to work ok with files that have daily or monthly data but sometimes gets confused by files that have other time steps and may not display the time dimension correctly.

Other tools

Recent versions of CDAT (python package from PCMDI) can read UM PP files directly. CDAT can also write netcdf files (though it can't write PP files). CDAT doesn't interpret all the STASH codes correctly so some variables don't get sensible names. Ones it doesn't understand will have names like m1s0i268. This means Model 1, Section 0, Item No 268 in the STASH list.

The CDAT routine that interprets the STASH codes is cdunifpp_ppcode.c (in libcdms/src/cdunif/cdunifpp). In CDAT 4.1.2 this has a comment saying the information is taken from xconv v1.90. It would be possible to get the variable long names from the UM STASHmaster files, but these don't have short names or units.