csv_collator - bruno-beloff/scs_analysis GitHub Wiki

docs > software repositories > scs_analysis > commands > filtering and aggregating data


DESCRIPTION

The csv_collator utility is used to separate the input JSON documents according to the upper and lower bounds of a sequence of bins. For each bin, assignment follows the rule:

lower bound <= value < upper bound

The upper and lower bounds for the data set should be specified, along with a step size. The number of bins required to service this domain is calculated automatically. Additionally, a file (and path) prefix for the generated CSV files must be specified, along with the path identifying the leaf node in the input document where the value is to be found.

Documents that do not contain a field at the specified path, or have values that cannot be evaluated as a float, are ignored. Likewise, values outside the upper and lower bounds are ignored.

On completion, a summary of the bin assignments is written to stdout. If no file prefix is given, then the CSV files are not generated, but the report is still produced.

Two collators are provided in this package: csv_collator collates into separate CSV files (collate to rows), whereas sample_collator collates into separate columns (collate to columns).

SYNOPSIS

csv_collator.py -l LOWER_BOUND -u UPPER_BOUND -d DELTA [-f FILE_PREFIX] [-v] PATH

Options
--version show program's version number and exit
-h, --help show this help message and exit
-l LOWER, --lower=LOWER lower bound of dataset
-u UPPER, --upper=UPPER upper bound of dataset
-d DELTA, --delta=DELTA width of bin
-f FILE_PREFIX, --file-prefix=FILE_PREFIX file prefix for collated CSVs
-v, --verbose report narrative to stderr

EXAMPLES

csv_reader.py alphasense_303_2018-08.csv | csv_collator.py -l 5.0 -u 21.0 -d 1.0 -f collation/alphasense_303_2018-08 -v val.sht.hmd.aH

FILES

Output file names are of the form: FILE_PREFIX_DOMAIN_LOW_DOMAIN_HIGH.csv

SEE ALSO

scs_analysis/csv_collation_summary
scs_analysis/csv_segmentor
scs_analysis/csv_reader
scs_analysis/csv_writer
scs_analysis/sample_collator
scs_analysis/sample_nullify
scs_analysis/sample_subset
scs_analysis/sample_error