CSV Spark connector - CSharplie/ploosh GitHub Wiki

This connector is used to read CSV files using Spark.

⚠️ A spark connector can be use only with another spark connector. It is not possible to use a spark connector with a non spark connector.

See Spark documentation for more information.

Connection configuration

No connection is required by this connector

Configuration

Test case configuration

Name Mandatory Default Description
path yes Path to the CSV
delimiter no , Column delimiter
header no true Use the first row as header
inferSchema no False Infers the input schema automatically from data
multiline no False Parse one record, which may span multiple lines, per file
quote no '"' Character used to denote the start and end of a quoted item
encoding no "UTF-8" Encoding to use for UTF when reading/writing
lineSep no "\n" Character used to denote a line break

Example

Example CSV Spark:
  source:
    type: csv_spark
    path: data/employees/*.csv
    multiline: False
    inferSchema: False
    encoding: "UTF-8" 
  expected:
    type: sql_spark
    query: |
      select * 
          from employees
          where hire_date < "2000-01-01"
⚠️ **GitHub.com Fallback** ⚠️