2.7 Timeseries padder: variable vs. constant - NEONScience/NEON-IS-data-processing GitHub Wiki

Some QC tests (e.g. spike, persistence) evaluate windows of data rather than a single data point (e.g. range). Thus, the QAQC module needs to pad data before and/or after the time window of interest in order to perform the full suite of QC tests. The timeseries_padder module does this. Since the NEON processing pipelines operate on daily time blocks, typically at least a day before and a day after the target day need to be accessed.

Data may be padded using a constant value that is specified for all data on which the module operates, or a variable value that is determined from the expected data rate of the named location (found within the location file) and the value of the QC thresholds for the named location (found in the thresholds file). The latter is preferred because it will automatically adjust to changes in threshold parameters and data rate that result in needing a larger or smaller window of data to perform QAQC.

Constant pad

The constant timeseries padder python module timeseries_padder.timeseries_padder.constant_pad_main uses variables designated under env: (e.g. OUT_PATH, WINDOW_SIZE, YEAR_INDEX, etc.) to designate arguments for the module. See an example of how the env: is designated for the constant timeseries padder below:

transform:
  image_pull_secrets:
  - battelleecology-quay-read-all-pull-secret
  image: quay.io/battelleecology/timeseries_padder:26
  cmd:
  - "/bin/bash"
  stdin:
  - "#!/bin/bash"
  - python3 -m timeseries_padder.timeseries_padder.constant_pad_main
  env:
    OUT_PATH: /pfs/out
    WINDOW_SIZE: '1'
    LOG_LEVEL: INFO
    RELATIVE_PATH_INDEX: '3'
    YEAR_INDEX: '4'
    MONTH_INDEX: '5'
    DAY_INDEX: '6'
    LOCATION_INDEX: '7'
    DATA_TYPE_INDEX: '8'

Variable pad

The variable timeseries padder python module does not use the env specified in a yaml file, but rather arguments passed via the python command using the argparse python package. This same approach is also used in the [SHORT-NAME]_egress.yaml. The following example shows the corresponding variable timeseries padder employed in the [SHORT-NAME]_timeseries_padder.yaml. Note how timeseries_padder.timeseries_padder.variable_pad_main is now called, followed by the arguments that will be parsed in lieu of being specified in env:.

transform:
  image_pull_secrets:
  - battelleecology-quay-read-all-pull-secret
  image: quay.io/battelleecology/timeseries_padder:31
  cmd:
  - "/bin/bash"
  stdin:
  - "#!/bin/bash"
  - python3 -m timeseries_padder.timeseries_padder.variable_pad_main --yearindex 4 --monthindex 5 --dayindex 6 --locindex 7 --subdirindex 8
  env:
    OUT_PATH: /pfs/out
    LOG_LEVEL: INFO