1.4 NetCDF file format and structure - wolfiex/AerVis GitHub Wiki

General structure

The genearated netCDF file has four main compoonents. These are:

Dimensions
Coordinates
Attributes
Variables (datasets)
Groups

Dimensions

These contain information about the number of elements and their shape of any datasets (variables) contained within the file. In general these are longitude, latitude, elevation (model_level_number), pressure and pseudo_level.

Dimensions:                  (latitude: 144, longitude: 192, model_level_number: 85, pressure: 3, pseudo_level: 27)

Coordinates

These are constant 'attributes' which describe the data. Examples of these may be 1D arrays describing the lat,lon,elevation indexes, the timestamps for each column of data or the levels. Coordinates can be be multidimensional with their shape defined by the dimensions.

Coordinates:
  * latitude                 (latitude) float32 -89.375 -88.125 ... 89.375
  * longitude                (longitude) float32 0.9375 2.8125 ... 359.0625
  * pressure                 (pressure) float32 250.0 500.0 850.0
  * pseudo_level             (pseudo_level) int32 3 4 6 7 8 ... 907 908 909 910
  * model_level_number       (model_level_number) int32 1 2 3 4 ... 82 83 84 85
    forecast_period          timedelta64[ns] ...
    forecast_reference_time  datetime64[ns] ...
    time                     datetime64[ns] ...
    level_height             (model_level_number) float32 ...
    sigma                    (model_level_number) float32 ...
    surface_altitude         (latitude, longitude) float32 ...
    altitude                 (model_level_number, latitude, longitude) float32 .

Additionally, any calculation constants are also appended as coordinates

This is for easy referencing (as this is where we store all information used to describe the data)

    r_specific      float64 287.1
    molar_mass_air  float64 0.02899
    avogadro        float64 6.022e+23

Attributes

This is where we place information relevant to the file, its sources, or time taken to concatenate all pp files.

Attributes:
    avg_cube_delta:   0.04959738963498088
    files:            ['/Users/wolfiex/UKCA_postproc/data/n96_hadgem1_qrparm....
    iris_cube_delta:  120.22364183700003
    L0_delta:         133.26909520400002
    stashname:        ~Users~wolfiex~UKCA_postproc~data~AerVis~aervis~variabl...

Data Variables

These contain our datasets, of dimensions defined earlier. In our case we have defined these using the stash codes, with the fill names being easily extractable from the VariableReference class.

Data variables:
    m01s00i096               (latitude, longitude) float32 ...
    m01s00i509               (latitude, longitude) float32 ...
    m01s01i202               (latitude, longitude) float32 ...
    m01s01i270               (pseudo_level, latitude, longitude) float32 ...
    m01s01i271               (pseudo_level, latitude, longitude) float32 ...

Each data variable consists of a dataset, coordinates, and attributes describing it

In [17]: d.data['m01s00i096']                                                   
Out[17]: 
<xarray.DataArray 'm01s00i096' (latitude: 144, longitude: 192)>
array([[0.000000e+00, 0.000000e+00, 0.000000e+00, ..., 0.000000e+00,
        0.000000e+00, 0.000000e+00],
       ...,
       [7.713828e-07, 7.713828e-07, 7.713827e-07, ..., 7.713829e-07,
        7.713829e-07, 7.713828e-07]], dtype=float32)
Coordinates:
  * latitude                 (latitude) float32 -89.375 -88.125 ... 89.375
  * longitude                (longitude) float32 0.9375 2.8125 ... 359.0625
    forecast_period          timedelta64[ns] ...
    forecast_reference_time  datetime64[ns] ...
    time                     datetime64[ns] ...
    surface_altitude         (latitude, longitude) float32 ...
    height                   float64 ...
Attributes:
    long_name:     
    source:        Data from Met Office Unified Model
    um_version:    11.1
    STASH:         [ 1  0 96]
    cell_methods:  time: mean (interval: 1 hour)

Groups

Generally unused in this case, but since netCDFs are actually of the HDF5 format, groups can be used to separate the data into a hierarchical structure. Groups usually contain all of the above information and an example of their usage may be when comparing several different runs containing the same variable names.