sample_subset - bruno-beloff/scs_analysis GitHub Wiki
docs > software repositories > scs_analysis > commands > filtering and aggregating data
DESCRIPTION
The sample_subset utility is used to find a subset of documents whose value for a specified field lies either inside or outside one or two bounding values.
Input is in the form of a stream of JSON documents. Documents are written to stdout if they match the specification, and discarded otherwise. Documents which do not have the specified field, or have an empty field value are also discarded. If a field value is present but cannot be cast to the correct type, then the sample_subset utility terminates.
The type of the field may be specified explicitly as either numeric or ISO 8601 datetime. If no specification is given, the value is interpreted as a string.
Evaluation follows the rule:
lower bound <= value < upper bound
Both upper and lower bounds are optional. If both are present, then the lower bound value must be less than the upper bound value. If neither are present, then the sample_subset utility filters out documents with missing fields or empty values. An alternative test is --equal.
If the --exclusions flag is used, the sample_subset utility outputs only the documents that do not fit within the specification. Note that, in this case, documents with missing or empty fields are still discarded.
SYNOPSIS
sample_subset.py { -i | -n | -s } { [-e EQUAL] | [-l LOWER] [-u UPPER] } [-s] [-x] [-v] PATH
Options | |
---|---|
--version | show program's version number and exit |
-h, --help | show this help message and exit |
-i, --iso8601 | interpret the value as an ISO 8601 datetime |
-n, --numeric | interpret the value as a number |
-s, --string | interpret the value as a string |
-t, --strict | halt on type errors |
-e EQUAL, --equal=EQUAL | equal to |
-l LOWER, --lower=LOWER | lower bound |
-u UPPER, --upper=UPPER | upper bound |
-x, --exclusions | output exclusions instead of inclusions |
-v, --verbose | report narrative to stderr |
EXAMPLES
csv_reader.py praxis_303.csv | sample_subset.py -v -i -l 2018-09-26T00:00:00Z -u 2018-09-27T00:00:00Z rec
csv_reader.py -v scs-bgx-431-ref-meteo-gases-2020-H1-slp.csv | sample_subset.py -v "ref.NO2 Processed Measurement (ppb)" | csv_writer.py -v scs-bgx-431-ref-meteo-gases-2020-H1-slp-no2.csv
SEE ALSO
scs_analysis/csv_collator
scs_analysis/sample_nullify