Ingestion vs Data Processing - ja-guzzle/guzzle

While there is some similarities between Ingestion and Data Processing, this doc tries to clarify the differences between the two.

Ingestion

"Bringing data in" is the the primary concern of Ingestion module. It is focused to bring data inside the data lake (or target platform). This layer does not deal with complex processing logic.

Data Processing

"Transforming data from one form to other" is the key purpose of Data Processing module. And this transformation can usually cut across multiple tables (via join/ subquery etc), multiple records (aggregate/ grouping), multiple columns (multi column case statements) or even link up source and target data to do update or insert of source data into target table. Data Processing is the module which supports intra-ETL from staging to foundation to analytics data tables.

Comparison

Below table provides list of concerns/ functionality that each of this supports

Sr. No.	Concerns	Ingestion	Data Processing
*	Append in target	Yes	Yes
*	truncate/insert in target	Yes	Yes
*	Merge	No	Yes
*	Files as Source	Yes (very much meant for it)	No (it always expects tables or table like - example external tales)
*	Cross-column transformation	No	Yes via SQL
*	Cross-row transformation	No	Yes via SQL
*	Cross-table transformation	No	Yes via SQL
*	SQL as source	Mostly No. Except for outbound file generation	Yes
*	Partition handling	Yes (limited)	Yes (extensive)
*	Auto ID generation	No	Yes
*	Auto column mapping	Yes	Yes
*	Table to table data movement	Yes (but Data Processing preferred)	Yes

Ingestion vs Data Processing - ja-guzzle/guzzle_docs GitHub Wiki

Ingestion

Data Processing

Comparison

⚠️ GitHub.com Fallback ⚠️

Ingestion vs Data Processing - ja-guzzle/guzzle_docs GitHub Wiki

Ingestion

Data Processing

Comparison

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️