Ingestion vs Data Processing - ja-guzzle/guzzle_docs GitHub Wiki

While there is some similarities between Ingestion and Data Processing, this doc tries to clarify the differences between the two.

Ingestion

"Bringing data in" is the the primary concern of Ingestion module. It is focused to bring data inside the data lake (or target platform). This layer does not deal with complex processing logic.

Data Processing

"Transforming data from one form to other" is the key purpose of Data Processing module. And this transformation can usually cut across multiple tables (via join/ subquery etc), multiple records (aggregate/ grouping), multiple columns (multi column case statements) or even link up source and target data to do update or insert of source data into target table. Data Processing is the module which supports intra-ETL from staging to foundation to analytics data tables.

Comparison

Below table provides list of concerns/ functionality that each of this supports

Sr. No. Concerns Ingestion Data Processing
* Append in target Yes Yes
* truncate/insert in target Yes Yes
* Merge No Yes
* Files as Source Yes (very much meant for it) No (it always expects tables or table like - example external tales)
* Cross-column transformation No Yes via SQL
* Cross-row transformation No Yes via SQL
* Cross-table transformation No Yes via SQL
* SQL as source Mostly No. Except for outbound file generation Yes
* Partition handling Yes (limited) Yes (extensive)
* Auto ID generation No Yes
* Auto column mapping Yes Yes
* Table to table data movement Yes (but Data Processing preferred) Yes
⚠️ **GitHub.com Fallback** ⚠️