Guzzle Core - ja-guzzle/guzzle_docs GitHub Wiki
These are series of modules achieve specific workflows/ tasks for data integration. While they leverage the services/ context from Common services - they are supposedly to be fairly independent and can be run standalone. Native modules are loosely coupled and all the context is passed to this module with series of parameters (you can assume it passing a hash-map with key value of pairs)
- Caters to ingesting data from files, and relational database in batch mode and from Kafka in real-time mode
- Performs schema validation, control checks, file format check
- Allows configuring target partition scheme and incremental extraction criteria
- Staleness handling for late arriving files
- Supports end of day/end of month handling, overwrite and append modes on target
- A generic data loading framework which allows defining the transformation and loading rules using declarative config
- Data Processing rules defined as SQLs
- Enforces consistent implementation of standards and design patterns
- Prevent rewriting repetitive ETL code and avoid any manual errors due to this
- Allows to control performance and other relevant global parameters centrally
- Generic module to house keep the data
- Allows configuring the housekeeping based on date columns as well as other
- Allows configuring retention period for multiple time periods (xxx rolling days , yy rolling month end etc.)
- The data falling outside of retention window can be purged or moved to alternate location
- Perform Data Quality (DQ) validation on specified columns and tables
- Logging of records and statistics failing the constraint checks
- The validation rules applicable for structured data and can currently specified as SQL
- Recon framework for technical recon between source and target datasets
- Performs count, hash and sum checks
- Maintain detail list of record (PK values/ rowid) having reconciliation gaps
These are external frameworks and tools that are supported by Guzzle,
ETL or ELT tools like ODI and Informatica can be integrated with Guzzle.
Data prep tools like Paxata, DataIKU, Trifacta, Data Mere can be orchestrated and hooked as external module