sample_subset - bruno-beloff/scs_analysis GitHub Wiki

docs > software repositories > scs_analysis > commands > filtering and aggregating data


DESCRIPTION

The sample_subset utility is used to find a subset of documents whose value for a specified field lies either inside or outside one or two bounding values.

Input is in the form of a stream of JSON documents. Documents are written to stdout if they match the specification, and discarded otherwise. Documents which do not have the specified field, or have an empty field value are also discarded. If a field value is present but cannot be cast to the correct type, then the sample_subset utility terminates.

The type of the field may be specified explicitly as either numeric or ISO 8601 datetime. If no specification is given, the value is interpreted as a string.

Evaluation follows the rule:

lower bound <= value < upper bound

Both upper and lower bounds are optional. If both are present, then the lower bound value must be less than the upper bound value. If neither are present, then the sample_subset utility filters out documents with missing fields or empty values. An alternative test is --equal.

If the --exclusions flag is used, the sample_subset utility outputs only the documents that do not fit within the specification. Note that, in this case, documents with missing or empty fields are still discarded.

SYNOPSIS

sample_subset.py { -i | -n | -s } { [-e EQUAL] | [-l LOWER] [-u UPPER] } [-s] [-x] [-v] PATH

Options
--version show program's version number and exit
-h, --help show this help message and exit
-i, --iso8601 interpret the value as an ISO 8601 datetime
-n, --numeric interpret the value as a number
-s, --string interpret the value as a string
-t, --strict halt on type errors
-e EQUAL, --equal=EQUAL equal to
-l LOWER, --lower=LOWER lower bound
-u UPPER, --upper=UPPER upper bound
-x, --exclusions output exclusions instead of inclusions
-v, --verbose report narrative to stderr

EXAMPLES

csv_reader.py praxis_303.csv | sample_subset.py -v -i -l 2018-09-26T00:00:00Z -u 2018-09-27T00:00:00Z rec
csv_reader.py -v scs-bgx-431-ref-meteo-gases-2020-H1-slp.csv | sample_subset.py -v "ref.NO2 Processed Measurement (ppb)" | csv_writer.py -v scs-bgx-431-ref-meteo-gases-2020-H1-slp-no2.csv 

SEE ALSO

scs_analysis/csv_collator
scs_analysis/sample_nullify

RESOURCES

ISO 8601