access_SMOCS - ACCESS-NRI/accessdev-Trac-archive GitHub Wiki

Statistical Model Output Collection System

Philosophy and Objectives=

Automatically collect as many statistics as possible with no preconceptions on what is or isn't useful. (The avoid surprises principle)
- ALL output fields
- ALL output times
- ALL models
Save results in common format
Structure software so specific directory/file structure is at upper levels so that the underlying processes are as general as possible.
Develop tools to interrogate the collected data for various purposes
- Quantify model health (e.g. do the statistics stray from expected ranges?)
- Evaluate the differences between models
  - for model upgrades
  - assess physics and resolution effects between g, r, c etc over extended periods

Documentation

Implementation

Collection stage

The collection scripts cycle over the output files from each model run. They can either be run at the end of each model output step or over a directory containing a set of output files. This means they can be used to collect statistics for an operational model by being run over the operational output directory or on the model output archive at the NCI (lb4). Currently the input format has to be netcdf or UM fieldsfiles so grib archives need translation before they can be 'collected'. The scripts read each file and try to read all the fields on each file. The mean, standard deviation and maximum and minimum values are calculated over the entire domain as well as over the sea and land separately and saved for each model output time step. The position in the field of the maximum and minimum are also saved. For 3D fields the results are saved for the individual levels as well as the full 3D domain. Certain fields have the option for collecting either a percentile or histogram (bin) table which effectively bins values and their coordinates.

The results are saved as a json file for each model run and these in turn are saved in a directory structure which separates each month. As well as the general collection scripts there are separate 'utility collection' scripts. These calculate fields of interest which are not output by the model (such as the hourly rainfall accumulations or overall budget terms) and add them to the forecast summary json file.

Currently the I/O engine is based on cdms which is a python library which reads in netcdf or UM fieldsfiles and uses a netcdf variable API.

Harvesting Programs

The harvest scripts process the json files in the SMOCS data directory to produce various products. The initial plan was to generate continuously updated time series plots of a few specified variables and present them to a web page so that the 'health' of the model could be easily assessed - this is still under development. Another task which has evolved is to assess differences between models, e.g. the current model and an update candidate. As SMOCS collects statistics for all valid fields the differences can be assessed relatively easily.There is a harvest script which produces a spreadsheet of the differences between to models for all output fields and which can be used to assess the possible ramifications for downstream applications etc because all changes can be flagged.

Other utilities

bodoView.py - a simple json tree viewer available to look at individual SMOCS json files. Available in the Tools svn subdirectory or from my python directory on all machines.

smocsView.py - an updated version of bodoView.

Known Bugs

Name translation errors
- New fields not in the translation table are identified by their STASH id.

To Do List

Energy integral calculation for model monitoring purposes.
Submodel domain generation (e.g. r,c in g; c in r; tc in g?)
Sub-region (e.g. NH, SH, tropics etc) for global domains.
Efficiency improvements:
- Speed up binning/percentile routines
Investigate a move to iris and/or mule

File Locations

Source code repository

https://code.metoffice.gov.uk/svn/utils/smocs/

https://code.metoffice.gov.uk/trac/utils/browser/smocs

Collections:

Root directories: NCI: raijin:/g/data1/dp9/ljr548/SMOCS

    flurry:/g/ns/cw/access/climate-data-1/SMOCS

Underlying structure:

    modelID/Year/Month/modelID_YYYYMMDDHH.json

Preliminary results

Model Comparisons

ACCESS-G APS1 vs 00pl

ACCESS-R APS1 vs r2t6