csv_collator - bruno-beloff/scs_analysis GitHub Wiki

docs > software repositories > scs_analysis > commands > filtering and aggregating data

DESCRIPTION

The csv_collator utility is used to separate the input JSON documents according to the upper and lower bounds of a sequence of bins. For each bin, assignment follows the rule:

lower bound <= value < upper bound

The upper and lower bounds for the data set should be specified, along with a step size. The number of bins required to service this domain is calculated automatically. Additionally, a file (and path) prefix for the generated CSV files must be specified, along with the path identifying the leaf node in the input document where the value is to be found.

Documents that do not contain a field at the specified path, or have values that cannot be evaluated as a float, are ignored. Likewise, values outside the upper and lower bounds are ignored.

On completion, a summary of the bin assignments is written to stdout. If no file prefix is given, then the CSV files are not generated, but the report is still produced.

Two collators are provided in this package: csv_collator collates into separate CSV files (collate to rows), whereas sample_collator collates into separate columns (collate to columns).

SYNOPSIS

csv_collator.py -l LOWER_BOUND -u UPPER_BOUND -d DELTA [-f FILE_PREFIX] [-v] PATH

Options
--version	show program's version number and exit
-h, --help	show this help message and exit
-l LOWER, --lower=LOWER	lower bound of dataset
-u UPPER, --upper=UPPER	upper bound of dataset
-d DELTA, --delta=DELTA	width of bin
-f FILE_PREFIX, --file-prefix=FILE_PREFIX	file prefix for collated CSVs
-v, --verbose	report narrative to stderr

EXAMPLES

csv_reader.py alphasense_303_2018-08.csv | csv_collator.py -l 5.0 -u 21.0 -d 1.0 -f collation/alphasense_303_2018-08 -v val.sht.hmd.aH

FILES