csv_join - bruno-beloff/scs_analysis GitHub Wiki
docs > software repositories > scs_analysis > commands > file-based data management
DESCRIPTION
The csv_join utility performs an SQL join operation on a pair of CSV files. This is particularly useful where a comparison is to be made on timeline data from two devices.
All four join types are supported: inner, left outer, right outer and full outer. Inner joins are used when the output should only contain rows that match on their primary key.
Output is in the form of a sequence of JSON documents containing three fields: rec, left table contents (labelled for the left PREFIX) and right table contents (labelled for the right PREFIX). The left and right contents do not contain their primary key fields. For outer joins, content nodes may be null.
The --iso8601 flag is provided to indicate that the primary key should be interpreted as a ISO 8601 datetime. This is useful where data sets use alternate datetime formats such as 2019-02-22T01:00:00Z and 2019-02-22T01:00:00+00:00.
If the --verbose flag is used, a summary of the join operation is written to stderr.
Warning: if LEFT, RIGHT or FULL joins are used, the first document written to stdout may not contain all of the fields of the input CSV pair. In these cases, if the output is piped to the scs_analysis/csv_writer utility, then the csv_writer must be used in header-scan mode.
SYNOPSIS
csv_join.py [-t TYPE] -l PREFIX PK FILENAME -r PREFIX PK FILENAME [-i] [-v]
Options | |
---|---|
--version | show program's version number and exit |
-h, --help | show this help message and exit |
-t TYPE, --type=TYPE | { 'INNER' | 'LEFT' | 'RIGHT' | 'FULL' } (default 'INNER') |
-l LEFT, --left=LEFT | output path prefix, primary key and filename for left-hand set |
-r RIGHT, --right=RIGHT | output path prefix, primary key and filename for right-hand set |
-i, --iso8601 | interpret the primary key as an ISO 8601 datetime |
-v, --verbose | report narrative to stderr |
EXAMPLES
csv_join.py -i -l praxis rec praxis_301/praxis_301_2018-08.csv -r ref rec ref/ref_2018-08.csv -v
DOCUMENT EXAMPLE - INPUT
left:
{"rec": "2019-02-01T02:00:00Z", "val": {"NO2": {"weV": 0.297185, "cnc": 40.8, "aeV": 0.298467, "weC": 0.002271}}}
right:
{"rec": "2019-02-01T02:00:00Z", "val": {"NO2": {"status": "P", "units": "ugm-3", "dns": 34.0}}}
DOCUMENT EXAMPLE - OUTPUT
{"rec": "2019-02-01T02:00:00Z", "praxis": {"val": {"NO2": {"weV": 0.297185, "cnc": 40.8, "aeV": 0.298467, "weC": 0.002271}}}, "ref": {"val": {"NO2": {"status": "P", "units": "ugm-3", "dns": 34.0}}}}
SEE ALSO
scs_analysis/csv_reader
scs_analysis/csv_writer