CSV Files - ge-high-assurance/RACK GitHub Wiki

CSV File Format

The Comma-Separated Values format is one of the formats used by SemTK for both ingestion of data and for query responses.

This file format is defined by RFC 4180, and enjoys wide support in tools like Microsoft Excel and libraries like Python's csv package and Apache's commons-csv Java library.

While RFC 4180 describes the format in detail, there are some highlights to be aware of when reading or writing raw CSV files:

  1. Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes.
  2. If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote.

Text encoding

The RACK tools expect UTF-8 encoded content. The CSV files and JSON files it uses are all UTF-8 encoded.

In RACK v7 the CLI tool expected and produced UTF-8 without any leading signature. New in RACK v8 the CLI tool will tolerate (but not require) leading signatures as generated by some Microsoft software.