CSV Sink - rambabu-chamakuri/PSTL-DOC GitHub Wiki

The CSV Sink is a specific subset of the File Sink. As such, file sink options may also be configured on a CSV Sink. As always, the path option of the File Sink must be configured. Please review the File Sink if you are not already familiar with it.

The CSV Sink does not support all SQL types, since encoding semantics for certain data types are ambiguous (e.g., how binary should be represented as a string is an exercise for the reader). Currently, the CSV Sink supports the following data types:

  • byte
  • short
  • integer
  • long
  • float
  • double
  • boolean
  • decimal
  • timestamp
  • date
  • string

In addition to the data types above, the CSV Sink supports user defined types whose underlying type is one of the data types above. Users can coerce unsupported data types as they see fit by encoding other data types into one of the data types above.

TODO: link/comment on univocity

Options

codec

See compression.

comment

Specifies the character that represents a line comment when found in the beginning of a line of text. Although this option is passed down to the underlying CSV writer, it is not particularly useful in practice since the CSV Sink is always writing rows out, rather than determining whether or not to ignore a line of text if it is a comment.

Defaults to \u0000.

SAVE STREAM foo
TO CSV
OPTIONS(
  'comment'='#'
);

compression

The compression codec to use when generating CSV files. If compression is not specified, codec may be provided instead. Valid values include:

  • none
  • uncompressed
  • bzip2
  • deflate
  • gzip
  • lz4
  • snappy

Defaults to none.

SAVE STREAM foo
TO CSV
OPTIONS(
  'compression'='gzip'
);

dateFormat

Specifies how date columns should be formatted when rendered as a string in a CSV row. Refer to the Java Date and Time Patterns as needed.

Defaults to yyyy-MM-dd.

SAVE STREAM foo
TO CSV
OPTIONS(
  'dateFormat'='MM-dd-yyyy'
);

delimiter

See sep.

escape

Specifies the escape character used for escaping quotes inside an already quoted value.

Defaults to \.

SAVE STREAM foo
TO CSV
OPTIONS(
  'escape'='?'
);

escapeQuotes

Specifies fields which contain the quote character should be escaped by enclosing the entire value with quotes.

Defaults to true.

SAVE STREAM foo
TO CSV
OPTIONS(
  'escapeQuotes'='false'
);

header

Specifies whether column names should be written as the first line in the CSV file.

Defaults to false.

SAVE STREAM foo
TO CSV
OPTIONS(
  'header'='true'
);

ignoreLeadingWhiteSpace

Specifies whether or not leading whitespace(s) from values being written should be skipped. For example, when true, the value:

    foo

would be written as foo (e.g., leading whitespace has been removed).

Defaults to true.

SAVE STREAM foo
TO CSV
OPTIONS(
  'ignoreLeadingWhiteSpace'='false'
);

ignoreTrailingWhiteSpace

Specifies whether or not trailing whitespace(s) from values being written should be skipped. For example, when true, the value:

foo    

would be written as foo (e.g., trailing whitespace has been removed).

Defaults to true.

SAVE STREAM foo
TO CSV
OPTIONS(
  'ignoreTrailingWhiteSpace'='false'
);

nullValue

Specifies a specific value to write for fields with a value of null. For example, some users may prefer null fields are written as a literal string value of null.

Defaults to `` (e.g., empty string).

SAVE STREAM foo
TO CSV
OPTIONS(
  'nullValue'='null'
);

quote

Specifies which character should be used to quote fields. If an empty string is provided, the underlying quote character will be \u0000.

Defaults to ".

SAVE STREAM foo
TO CSV
OPTIONS(
  'quote'='\''
);

quoteAll

Specifies whether all fields should be quoted in the CSV row.

Defaults to false.

SAVE STREAM foo
TO CSV
OPTIONS(
  'quoteAll'='true'
);

sep

The separator to use between each field in the CSV row. If sep is not specified, delimiter may be provided instead.

Defaults to ,.

SAVE STREAM foo
TO CSV
OPTIONS(
  'sep'='|'
);

timestampFormat

Specifies how timestamp columns should be formatted when rendered as a string in a CSV row. Refer to the Java Date and Time Patterns as needed.

Defaults to yyyy-MM-dd'T'HH:mm:ss.SSSXXX.

SAVE STREAM foo
TO CSV
OPTIONS(
  'timestampFormat'='EEE, dd MMM yyyy HH:mm:ss Z'
);