csv_segmentor - bruno-beloff/scs_analysis GitHub Wiki
docs > software repositories > scs_analysis > commands > filtering and aggregating data
DESCRIPTION
The csv_segmentor utility is used to segment the input stream of JSON documents into CSV files whose rows have contiguous datetime values.
Contiguity is defined by the --max-interval flag. If the time interval between a document and the previous document is greater than this interval, then the current CSV file is closed, and a new file is opened. File names (and sub-directories) as specified by the --file-prefix flag. The datetime of the first row of CSV file is appended to the prefix.
The input documents must contain a field carrying an ISO 8601 datetime. If the field in a given document is empty or malformed, the document is ignored. If the field is not present in any document, the csv_segmentor utility terminates.
The csv_segmentor utility generates a report giving the specifications of each contiguous block. If no file prefix is given, then the CSV files are not generated, but the report is still produced.
SYNOPSIS
csv_segmentor.py -m { [[DD-]HH:]MM[:SS] | :SS } [-i ISO] [-f FILE_PREFIX] [-v]
Options | |
---|---|
--version | show program's version number and exit |
-h, --help | show this help message and exit |
-m MAX_INTERVAL, --max-interval=MAX_INTERVAL | maximum permitted interval |
-i ISO, --iso-path=ISO | path for ISO 8601 datetime field (default 'rec') |
-f FILE_PREFIX, --file-prefix=FILE_PREFIX | file prefix for contiguous CSVs |
-v, --verbose | report narrative to stderr |
EXAMPLES
csv_reader.py -v scs-bgx-508-gases-2020-Q1.csv | csv_segmentor.py -m 06:00 -f segments/scs-bgx-508-gases-2020-Q1 -v | csv_writer.py -v segments/scs-bgx-508-gases-2020-Q1-report.csv
DOCUMENT EXAMPLE - REPORT OUTPUT
{"start": "2019-01-01T00:00:01Z", "end": "2019-01-04T10:04:51Z", "prev-interval": "", "max-interval": "00-00:00:11", "count": 29550}
{"start": "2019-01-04T10:29:21Z", "end": "2019-01-04T10:37:41Z", "prev-interval": "00-00:24:30", "max-interval": "00-00:00:10", "count": 51}
{"start": "2019-01-04T11:35:49Z", "end": "2019-01-04T11:41:19Z", "prev-interval": "00-00:58:07", "max-interval": "00-00:00:10", "count": 34}
FILES
Output file names are of the form: FILE-PREFIX_BLOCK-START-DATETIME.csv
SEE ALSO
scs_analysis/csv_collator
scs_analysis/csv_reader
scs_analysis/csv_writer