CSV Sink - rambabu-chamakuri/PSTL-DOC GitHub Wiki
The CSV Sink is a specific subset of the File Sink. As such, file sink options may also be configured on a CSV Sink. As always, the path
option of the File Sink must be configured. Please review the File Sink if you are not already familiar with it.
The CSV Sink does not support all SQL types, since encoding semantics for certain data types are ambiguous (e.g., how binary
should be represented as a string is an exercise for the reader). Currently, the CSV Sink supports the following data types:
- byte
- short
- integer
- long
- float
- double
- boolean
- decimal
- timestamp
- date
- string
In addition to the data types above, the CSV Sink supports user defined types whose underlying type is one of the data types above. Users can coerce unsupported data types as they see fit by encoding other data types into one of the data types above.
TODO: link/comment on univocity
Options
codec
See compression.
comment
Specifies the character that represents a line comment when found in the beginning of a line of text. Although this option is passed down to the underlying CSV writer, it is not particularly useful in practice since the CSV Sink is always writing rows out, rather than determining whether or not to ignore a line of text if it is a comment.
Defaults to \u0000
.
SAVE STREAM foo
TO CSV
OPTIONS(
'comment'='#'
);
compression
The compression codec to use when generating CSV files. If compression
is not specified, codec
may be provided instead. Valid values include:
- none
- uncompressed
- bzip2
- deflate
- gzip
- lz4
- snappy
Defaults to none
.
SAVE STREAM foo
TO CSV
OPTIONS(
'compression'='gzip'
);
dateFormat
Specifies how date columns should be formatted when rendered as a string in a CSV row. Refer to the Java Date and Time Patterns as needed.
Defaults to yyyy-MM-dd
.
SAVE STREAM foo
TO CSV
OPTIONS(
'dateFormat'='MM-dd-yyyy'
);
delimiter
See sep.
escape
Specifies the escape character used for escaping quotes inside an already quoted value.
Defaults to \
.
SAVE STREAM foo
TO CSV
OPTIONS(
'escape'='?'
);
escapeQuotes
Specifies fields which contain the quote character should be escaped by enclosing the entire value with quotes.
Defaults to true
.
SAVE STREAM foo
TO CSV
OPTIONS(
'escapeQuotes'='false'
);
header
Specifies whether column names should be written as the first line in the CSV file.
Defaults to false
.
SAVE STREAM foo
TO CSV
OPTIONS(
'header'='true'
);
ignoreLeadingWhiteSpace
Specifies whether or not leading whitespace(s) from values being written should be skipped. For example, when true
, the value:
foo
would be written as foo
(e.g., leading whitespace has been removed).
Defaults to true
.
SAVE STREAM foo
TO CSV
OPTIONS(
'ignoreLeadingWhiteSpace'='false'
);
ignoreTrailingWhiteSpace
Specifies whether or not trailing whitespace(s) from values being written should be skipped. For example, when true
, the value:
foo
would be written as foo
(e.g., trailing whitespace has been removed).
Defaults to true
.
SAVE STREAM foo
TO CSV
OPTIONS(
'ignoreTrailingWhiteSpace'='false'
);
nullValue
Specifies a specific value to write for fields with a value of null
. For example, some users may prefer null
fields are written as a literal string value of null
.
Defaults to `` (e.g., empty string).
SAVE STREAM foo
TO CSV
OPTIONS(
'nullValue'='null'
);
quote
Specifies which character should be used to quote fields. If an empty string is provided, the underlying quote character will be \u0000
.
Defaults to "
.
SAVE STREAM foo
TO CSV
OPTIONS(
'quote'='\''
);
quoteAll
Specifies whether all fields should be quoted in the CSV row.
Defaults to false
.
SAVE STREAM foo
TO CSV
OPTIONS(
'quoteAll'='true'
);
sep
The separator to use between each field in the CSV row. If sep
is not specified, delimiter
may be provided instead.
Defaults to ,
.
SAVE STREAM foo
TO CSV
OPTIONS(
'sep'='|'
);
timestampFormat
Specifies how timestamp columns should be formatted when rendered as a string in a CSV row. Refer to the Java Date and Time Patterns as needed.
Defaults to yyyy-MM-dd'T'HH:mm:ss.SSSXXX
.
SAVE STREAM foo
TO CSV
OPTIONS(
'timestampFormat'='EEE, dd MMM yyyy HH:mm:ss Z'
);