sample_collator - bruno-beloff/scs_analysis GitHub Wiki

docs > software repositories > scs_analysis > commands > filtering and aggregating data


DESCRIPTION

The sample_collator utility is used to separate input JSON document dependent values according to the upper and lower bounds of an independent value. For each column, assignment follows the rule:

lower bound <= value < upper bound

The upper and lower bounds for the data set should be specified, along with a delta size. The number of columns required to service this domain is calculated automatically. The path identifying the leaf node in the input document for both the independent and dependent fields must be specified.

Two collators are provided in this package: sample_collator collates into separate columns (collate to columns), whereas csv_collator collates into separate CSV files (collate to rows).

SYNOPSIS

sample_collator.py -x IND_PATH [-n NAME] -y DEP_PATH [-l LOWER_BOUND] -u UPPER_BOUND -d DELTA [-v]

Options
--version show program's version number and exit
-h, --help show this help message and exit
-x IND_PATH, --ind-path=IND_PATH path to independent variable
-n NAME, --name=NAME name for the independent variable
-y DEP_PATH, --dep-path=DEP_PATH path to dependent variable
-u UPPER, --upper=UPPER upper bound of dataset
-d DELTA, --delta=DELTA width of column domain
-l LOWER, --lower=LOWER lower bound of dataset (default 0)
-v, --verbose report narrative to stderr

EXAMPLES

csv_reader.py -v -l 1 scs-pb1-3-ref-opc-r1-error-2019-09-27T11-03-41+01-00-15min-exegesis-error.csv | \
sample_collator.py -v -x "ref.pmx.PM25 Processed Measurement (µg/m³)" -n PM25 -y error.pm2p5 -u 60 -d 10

DOCUMENT EXAMPLE - INPUT

{"rec": "2019-10-11T10:15:00Z", "opc": {"pmx": {"tag": "scs-pb1-3", "src": "R1", "val": {"per": 9.9, "pm1": 2.2, "pm2p5": 8.4, "pm10": 12.0}}}, "error": {"pm1": 1.517, "pm2p5": 3.561, "pm10": 3.429}}

DOCUMENT EXAMPLE - OUTPUT

{"rec": "2019-10-11T10:15:00Z", "opc": {"pmx": {"tag": "scs-pb1-3", "src": "R1", "val": {"per": 9.9, "pm1": 2.2, "pm2p5": 8.4, "pm10": 12.0}}}, "error": {"pm1": 1.517, "pm2p5": {"src": 3.561, "PM25_0-10": 3.561, "PM25_10-20": null, "PM25_20-30": null, "PM25_30-40": null, "PM25_40-50": null, "PM25_50-60": null, "PM25_60-70": null}, "pm10": 3.429}}

SEE ALSO

scs_analysis/csv_collator