CSV Spark connector - CSharplie/ploosh GitHub Wiki
This connector is used to read CSV files using Spark.
See Spark documentation for more information.
No connection is required by this connector
| Name | Mandatory | Default | Description |
|---|---|---|---|
| path | yes | Path to the CSV | |
| delimiter | no | , | Column delimiter |
| header | no | true | Use the first row as header |
| inferSchema | no | False | Infers the input schema automatically from data |
| multiline | no | False | Parse one record, which may span multiple lines, per file |
| quote | no | '"' | Character used to denote the start and end of a quoted item |
| encoding | no | "UTF-8" | Encoding to use for UTF when reading/writing |
| lineSep | no | "\n" | Character used to denote a line break |
Example CSV Spark:
source:
type: csv_spark
path: data/employees/*.csv
multiline: False
inferSchema: False
encoding: "UTF-8"
expected:
type: sql_spark
query: |
select *
from employees
where hire_date < "2000-01-01"