Notes for the development of PCMDIobs - PCMDI/PCMDIobs-cmor-tables GitHub Wiki

Preparation of PCMDIobs - the process

Overview

This repo is used to curate gridded datasets used in support of the PMP and E3SM diagnostics and other CMEC related activities. It is tightly aligned with the obs4MIPs data specifications and therefore the organization of CMIP data. A simple overview of the process to add new data is as follows: 1) download and document the data needed, 2) create an issue and branch for processing this data, 3) develop a CMOR3 python script and input json file, and 4) test-process the data, 5) create a pull-request for the team to confirm before merging to master and processing into the database.

PCMDIobs is being routinely updated as the meta-data in this process is improved and QC capabilities are advanced.

Installing CMOR3 required utilities

The current version is python 3 compatible and uses CDMS2 to read in data and process it with CMOR3. All necessary software needed for this process is available via conda:

conda create -n MY_ENV_NAME -c conda-forge cmor cdms2 cdutil python=3.8

Acquiring new data to add

At LLNL, desired datasets are added by one of the team members (with permissions) helping to curate PCMDIobs to the following location: /p/user_pub/pmp/pmp_obs_preparation/orig/data. Here, the contributor creates a new directory with a name that reasonably identifies the dataset. In that directory, they include all of the data they have downloaded along with a README.txt file with the following information: 1) The URL where the dataset was retrieved, 2) the date it was downloaded, and 3) the email address of the person who retrieved and processed the data. To further provenance documentation this information will eventually be added to the netCDF files via the placeholder "curation_provenance" attribute.

Prepare to process the data

Add a new issue and branch for the processing, for example:

issue: New data for preparing ERA5 3D data create branch: 7_era5-3d_pjg (for issue #7)

Checkout branch and add a new "source_id" and "institution_ids.py" as needed in /export/gleckler1/git/PCMDIobs-cmor-tables/src/

Before running CMOR, update PCMDIobs CMOR tables by running

/export/gleckler1/git/PCMDIobs-cmor-tables/src/writeJson.py

Using CMOR to prepare the data

Many examples are available at https://github.com/PCMDI/PCMDIobs-cmor-tables/tree/master/inputs This requires an "input JSON" file and a python code for processing the data with CMOR.

Once set up, a typical execution is:

python runCmor_CERES4.1_2D.py
(This code read the input JSON https://github.com/PCMDI/PCMDIobs-cmor-tables/blob/master/inputs/CERES4.1-input.json)

Note: Special attention is required for data on multiple vertical levels.

A dry-run should be external to the PCMDIobs database. Once successfully tested, create a pull request for the team to verify before merging to master.

QC and miscellaneous processing:

https://github.com/PCMDI/PCMDIobs-cmor-tables/tree/master/qc

https://github.com/PCMDI/PCMDIobs-cmor-tables/tree/master/inputs/misc