Ingestion - ja-guzzle/guzzle_docs GitHub Wiki

Table of Contents

Overall

  1. Are the column names referred in Ingestion module case sensitive:

Source Section:

  • Column mapping for JSON/XML and delimited file
  • Column names in grok/regexp parser for Text file
  • SQL and filters for Hive, Delta and JDBC

Schema Section:

  • Column name
  • Validate SQL and Transform SQL

Target Section:

  • Partition columns
  1. Are the table names / file names referred in Ingestion module case sensitive:

Source Section:

  • table name /SQL for JDBC
  • file name pattern for file sources

Schema Section:

  • in table in sub-query in validate SQL and Transform SQL

Target Section:

  • table name for hive/jdbc/delta
  • file name for hive/jdbc/delta

Reject Section:

  • table name for hive/jdbc/delta
  • file name for hive/jdbc/delta

Source

Common

File

Common

JSON

Validation and Transformation

Common

  • The validation threshold for JDBC when using parallelism is applied at the total pull level or for each partition?

Answer: Its at total "pull level". Entire JDBC feed is treated as one data frame though the data is read from JDBC source via multiple executors as per the config. The threshold is applied at the total pull level

  • Why don't we support zero failure threshold? Usually some wants to reject the data if there is even one failure.

Answer: Yes, We support zero failure threshold. It will work if we specify it using editor.

  • What happens when the rejection section is not specified but the "Schema and Validation" section has to validation rules defined and there are records failing validation?

Answer: It will write valid records in the target and ignore invalid records

⚠️ **GitHub.com Fallback** ⚠️