1.5 Populating Thresholds - NEONScience/NEON-IS-data-processing GitHub Wiki

Populating thresholds

The [SHORT-NAME]_threshold repo is populated from the threshold manager on INT SOM. Chances are you'll need to edit the thresholds on INT for the products you're working with so that they can be used in the Pachyderm pipelines. This is because the existing processing system uses L0 data products and L0 terms when applying QA/QC (e.g. soilPRTResistance), but in the Pachyderm processing system we use L0' terms (e.g. temp) and context (e.g. soil) to distinguish thresholds for a given data product.

This will typically involve having CI create a new threshold on INT. Note that 'threshold' corresponds to the whole table/spreadsheet of thresholds for all locations, time periods, etc for a product or related set of products. In the existing system, each threshold sheet is identified by L0 DP ID + L0 term combination. In the new system, each threshold sheet is defined by the term + context combination. For now, we are retrofitting the old system to work with the new system, so you make a story in the CI Jira to create a new threshold for the same L0 ID but new L0' term, and attach the new context to it. Here's an example of the story description:

Please create/map thresholds for the soil temperature data product:

Existing L0 ID: DP0.00041.001
Existing L0 term: soilPRTResistance
   maps to...
New L0' term: temp
New Context(s): soil

Note that the L0' term you communicate to CI must match the L0' term in the data files being processed in the Pachyderm QA/QC module. If the L0 term and L0' term are the same, all you need to do is have CI attach the new context(s) to the existing thresholds.

The new specification of thresholds as a term + context combination provides greater flexibility in using the same thresholds for multiple products, but it also requires knowledge of the same term used across different data products. For example, the term 'temperature' is used in many sensors and products, so it's important to use a context that limits the thresholds to the intended product(s) while allowing multiple products to use the same thresholds when it makes sense. For example, soil temperature thresholds are likely to be very different from air-temperature thresholds, so it make sense to separate these temperature thresholds by assigning different contexts (such as 'soil' and 'aspirated-air'). However, single-aspirated air temperature, triple-aspirated air temperature, and the air temperature measured from the relative humidity sensor are all measuring the same fundamental quantity (air temperature), so it might make sense to use the more general 'air' context for these thresholds so that they intuitively apply to all of these products.

Note that you can use the threshold selection hierarchy to specify different thresholds sharing a term+context for different locations. Thresholds can be set for REALM, SITE, or a particular CFGLOC. This is useful to e.g. use the same threshold sheet for relative humidity sensors on the aquatic met station (DP1.00098.001) and on the buoy (DP1.20271.001). Alternatively, you could separate these threshold sheets by assigning a 'terrestrial' vs. 'buoy' context, but you would have to process these in separate modules (both the threshold_select module and qaqc_plausibility module - see documentation for those modules for more info).