csv_collator - bruno-beloff/scs_analysis GitHub Wiki
docs > software repositories > scs_analysis > commands > filtering and aggregating data
DESCRIPTION
The csv_collator utility is used to separate the input JSON documents according to the upper and lower bounds of a sequence of bins. For each bin, assignment follows the rule:
lower bound <= value < upper bound
The upper and lower bounds for the data set should be specified, along with a step size. The number of bins required to service this domain is calculated automatically. Additionally, a file (and path) prefix for the generated CSV files must be specified, along with the path identifying the leaf node in the input document where the value is to be found.
Documents that do not contain a field at the specified path, or have values that cannot be evaluated as a float, are ignored. Likewise, values outside the upper and lower bounds are ignored.
On completion, a summary of the bin assignments is written to stdout. If no file prefix is given, then the CSV files are not generated, but the report is still produced.
Two collators are provided in this package: csv_collator collates into separate CSV files (collate to rows), whereas sample_collator collates into separate columns (collate to columns).
SYNOPSIS
csv_collator.py -l LOWER_BOUND -u UPPER_BOUND -d DELTA [-f FILE_PREFIX] [-v] PATH
Options | |
---|---|
--version | show program's version number and exit |
-h, --help | show this help message and exit |
-l LOWER, --lower=LOWER | lower bound of dataset |
-u UPPER, --upper=UPPER | upper bound of dataset |
-d DELTA, --delta=DELTA | width of bin |
-f FILE_PREFIX, --file-prefix=FILE_PREFIX | file prefix for collated CSVs |
-v, --verbose | report narrative to stderr |
EXAMPLES
csv_reader.py alphasense_303_2018-08.csv | csv_collator.py -l 5.0 -u 21.0 -d 1.0 -f collation/alphasense_303_2018-08 -v val.sht.hmd.aH
FILES
Output file names are of the form: FILE_PREFIX_DOMAIN_LOW_DOMAIN_HIGH.csv
SEE ALSO
scs_analysis/csv_collation_summary
scs_analysis/csv_segmentor
scs_analysis/csv_reader
scs_analysis/csv_writer
scs_analysis/sample_collator
scs_analysis/sample_nullify
scs_analysis/sample_subset
scs_analysis/sample_error