Ingestion vs Data Processing - ja-guzzle/guzzle_docs GitHub Wiki
While there is some similarities between Ingestion and Data Processing, this doc tries to clarify the differences between the two.
"Bringing data in" is the the primary concern of Ingestion module. It is focused to bring data inside the data lake (or target platform). This layer does not deal with complex processing logic.
"Transforming data from one form to other" is the key purpose of Data Processing module. And this transformation can usually cut across multiple tables (via join/ subquery etc), multiple records (aggregate/ grouping), multiple columns (multi column case statements) or even link up source and target data to do update or insert of source data into target table. Data Processing is the module which supports intra-ETL from staging to foundation to analytics data tables.
Below table provides list of concerns/ functionality that each of this supports
Sr. No. | Concerns | Ingestion | Data Processing |
---|---|---|---|
* | Append in target | Yes | Yes |
* | truncate/insert in target | Yes | Yes |
* | Merge | No | Yes |
* | Files as Source | Yes (very much meant for it) | No (it always expects tables or table like - example external tales) |
* | Cross-column transformation | No | Yes via SQL |
* | Cross-row transformation | No | Yes via SQL |
* | Cross-table transformation | No | Yes via SQL |
* | SQL as source | Mostly No. Except for outbound file generation | Yes |
* | Partition handling | Yes (limited) | Yes (extensive) |
* | Auto ID generation | No | Yes |
* | Auto column mapping | Yes | Yes |
* | Table to table data movement | Yes (but Data Processing preferred) | Yes |