Level 1 - OzFlux/PyFluxPro GitHub Wiki
Level 1 - Reading the raw data
Overview
Level 1, or L1 for short, is the first stage of of the PyFluxPro processing path. This level opens an Excel workbook, reads the variables the user has requested from the workbook, combines this with user entered metadata and writes the data and metadata to a netCDF file. The resulting L1 netCDF file is then the starting point for all further processing stages in PyFluxPro.
The Excel workbook can contain multiple worksheets. It is good practice for all of the worksheets to start and end at the same times and to have the same number of records but this is not essential. PyFluxPro will handle worksheets of different lengths and with different gaps but it is always a good idea to start with as homogeneous a data set as practical.
There are 4 constraints on the Excel workbooks:
- Each worksheet in the workbook must have a column containing date and time (datetime) values.
- Each worksheet must have the variable names on the same row number.
- Each worksheet must have the data start on the same row number.
- All variable names on a worksheet must be unique.
PyFluxPro searches each worksheet for a column containing datetime values and uses the first datetime column found as the time stamp for the worksheet. Each variable is read from the workbook using the name of the worksheet containing the variable and the variable's name on the worksheet. The order of variables on a worksheet does not matter.
A key part of the L1 process is the merging of data, the variable values at each time step, and metadata, supporting information about the data contained in the file both global (pertaining to the whole file) and for each variable. The key concept for including metadata is that the resulting file should be self-describing. It should contain enough information about the site and the variables to enable the data user to do basic analyses without having to contact the data provider to enquire about things like variable name meanings, units, type of instrument, missing data values and so on. Details of the global and variable metadata are given in the following sections.
Level 1 data is not always available in the correct units for use with PyFluxPro. For example, EddyPro outputs temperatures in K but PyFluxPro uses temperature in degC. PyFluxPro offers a number of functions at L1 that can be used to convert data being read from the L1 workbook to the units used internally by PyFluxPro.
The L1 Control File
The L1 control file contains 3 sections:
- Files - this section contains the path to the input and output files (file_path), the input (in_filename) and output (out_filename) file names and the worksheet rows containing the variable names (in_headerrow) and the first row of data (in_firstdatarow).
- Global - this section contains user-specified metadata that relates to all of the data contained in the L1 output file.
- Variables - this section contains sub-sections for each variable to be read from the Excel workbook and written to the L1 netCDF file.
Each of these sections is described in the following sections. An example of an L1 control file, for the OzFlux Loxton site in the PFP_examples, is shown below.
The Files section
Description of the Files section
The Files section allows the user to specify the path to the input and output files, the names of the input and output files and the row numbers containing the variable names and the first row of data values, see the screenshot below.
The Global section
The Global section allows the user to specify global attributes for the netCDF file. Global attributes are metadata that provide the data user with information about the site, contact people, the license covering the data etc.
There are 4 global attributes that must be present before PyFluxPro can run:
- latitude - the latitude of the site
- longitude - the longitude of the site
- site_name - the name of the site
- time_step - the time step of the data, this must be a multiple of 15 minutes e.g. 15, 30, 60 minutes ...
The example L1 control file above shows the common global attributes used by OzFlux in the L1 netCDF files. The user can add more global attributes by right-clicking on the Global section heading in the Parameter column and selecting Add attribute. Global attributes can be removed by right clicking on the attribute name in the Parameter column and selecting Remove attribute.
A note about the time_zone global attribute
The time zone of your site is very important information for some users of your data since it specifies the relationship between the local time at your site and UTC and any changes to or from daylight savings time. PyFluxPro does not use the time_zone global attribute itself and it never alters the relationship between your data and the time stamp it came with. That is sacrosanct!
The time_zone global attribute is checked when L1 is run by comparing the entry in the L1 control file to the time zone derived from the site latitude and longitude. This is to avoid having the wrong time zone in the netCDF metadata, in obeyance of Isaac's Second Law ("The only thing worse than no metadata is incorrect metadata."). This means that the entry for the time_zone global attribute must conform to the format used for time zones by Python. This is usually of the form "country/city" e.g. "Australia/Melbourne" for my home town.
A full list of time zones in the correct format for use in the L1 control file is available at https://en.wikipedia.org/wiki/List_of_tz_database_time_zones#List.
The Variables section
Description of the Variables section
The Variables section is where the user specifies the variables to read from the Excel workbook and written to the L1 netCDF file and the variable attributes to be added to the netCDF file. An example of a variable sub-section from the Loxton control file is shown below.
The name of the variable sub-section in the Parameter column, AH_HMP_10m in this case, is the variable name in the netCDF file.
The xl sub-section contains 2 entries:
- sheet gives the worksheet in the Excel workbook that contains the desired variable.
- name gives the variable name on the Excel worksheet.
The Attr sub-section can contain multiple entries:
- group_name can be radiation, meteorology, flux, covariances, soil, diagnostics and describes the type of measurement (mandatory).
- height is the height, in m, of the measurement (mandatory).
- instrument is the instrument type (mandatory).
- long_name is a human-readable description of the variable (mandatory, from controlled vocabulary).
- serial_number is the serial number of the instrument (optional).
- standard_name is the standard name from the CF Metadata Conventions Standard Name controlled vocabulary (mandatory, from controlled vocabulary).
- statistic_type can be average, standard deviation, variance, sum (mandatory).
- units are the units of this variable (mandatory).
Attributes can be added to this list by right-clicking on the Attr sub-section heading in the Parameter column and selecting Add attribute.
The Function sub-section can be used to specify a function to be applied to the data read from the L1 workbook to convert the units of the data, to calculate standard deviations from variances and the apply a linear or polynomial transform to the data. The Function sub-section can contain only 1 entry:
- func is the name of an L1 function available in PyFluxPro, see the Adding a Function to a variable section for details.
In the example given in the screenshot above, the function kgpm3_to_gpm3() is used to convert absolute humidity in kg/m^3 to units of g/m^3.
Editing the Variables section
Editing the contents of the Variables section is similar to editing other sections in the L1 control file. Items can be added to or removed from the section using a context-sensitive menu that is displayed when the user right clicks on the section or sub-section titles in the Parameter column. Entries in the Value column can be edited by double clicking on the text in the Value column and editing the text.
Removing a Variable
Variables can be removed from the Variables section by right clicking on the variable name and selecting Remove variable, see the screenshot below.
Adding a New Variable
New variables can be added in 2 ways. First, by right clicking on the Variables section title in the Parameter column and selecting Add variable from the context menu, see the screenshot below.
Second, right clicking on a variable name in the Parameter column brings up a context menu. Clicking on New variable will add a new variable immediately above the selected entry, see below.
Adding a Function to a variable
A function can be added to a variable by right clicking on the variable name in the Parameter column, see above, and selecting Add Function. The function to be added can be chosen from the context menu displayed by right clicking on the entry in the Value column opposite the func keyword, see the screenshot below.
Running L1
Once the user has finished editing the L1 control file, it can be run by using the Current option of the Run entry on the main menu. The shortcut to run the current control file is Ctrl+R (press and hold down the control key and press the R key).
As with all processing in PyFluxPro, the L1 processing will write log messages to the Log window. It is important to check these messages in case a problem has occurred during this processing level. ERROR messages must be resolved before the user progresses to the next processing level. WARNING messages may be tolerated but may also cause issues at a later stage of processing. It is always a good idea, if possible, to resolve warning messages before continuing. INFO messages let the user know what processing is being performed.
Output from L1
Running L1 will produce an L1 netCDF file containing the data read from the L1 workbook.