CI CD in Guzzle - ja-guzzle/guzzle

Motivation

We want to allow user to not just build data piplines faster using different Guzzles job types like: Ingestion, Processing etc - but also be able to effectively integrate (changes from multiple streams), test and regress it and then deploy it.

Background

Guzzle provides tight integration with GIT
Guzzle bundles test automation framework /lib
A typical data project has complex packaging, deployment and data loading requirements - we want to initially focus on creating Guzzle package that can be promoted from one env to other env
Dev-ops stack, change management processes, operations processes, vary customer by customer - however we want to prescribe one blueprint for a given stack - typically for Guzzles cloud customer on Azure wiht assumption that parts of this should be repeatable for on premise deployment

References

This one talks about the challenges of CI/CD for data projects: https://medium.com/90seconds/continuous-integration-and-deployment-for-data-pipelines-at-90-seconds-53bf10521ea7
We should look at how this is supported for other stacks - ADF
We should look at inheriting relevant best practices from what CI/CD is done for non-Data world example apps build using Java etc.
Some relevant discussion - not a whole lot https://www.reddit.com/r/devops/comments/88c5ec/cicd_for_etl_and_dw/
This link is not valid as they are using for Jenking to orchestrate the jobs : https://techblog.livongo.com/jenkins-etl/ (ast
A good read on the stack that Stitch uses. Stitch like fivtran provides Data integraiton and replication form cloud app as a service : https://dzone.com/articles/the-tools-we-used-to-build-our-etl-pipeline-platfo
https://www.concentra.co.uk/resources/articles/continuous-integration-in-data-warehouse-development/
This is more like marketing showcase - but there is some real project being spoken about: https://www.infosys.com/IT-services/validation-solutions/white-papers/Documents/seven-step-framework-CICD-ETL-testing.pdf

Coverage

Lets come up with a to-be process which can be followed as part of Guzzle deployment of medium and large enterprises who want to follow automation of deployment

Handling of deploying incremental or full packages from Env A to B - the packages as much possible are generated in Git by privileged user with tag create in the repo
Test automation as part of CI pipelines
Validation and checks as part of CD pipeline
Support for manual workflow for both CI and CD pipliens
Ability to handle custom scripts which are either re-runnable (example: created stored procedures) and those which are not re-runnable (like create/alter table, one time data load int config and data tables).
Cleanup /rollabck scripts: to restore guzzle env with all the "default" and "instance" to old values ; for custom script both re-runnable and non-reunnable - we can suggest simple backup and restore of the DB (or explicit script bundled by the devt team)
Generating the package and staging it for subsequent hand-off

CI CD in Guzzle - ja-guzzle/guzzle_docs GitHub Wiki

Motivation

Background

References

Coverage

⚠️ GitHub.com Fallback ⚠️

CI CD in Guzzle - ja-guzzle/guzzle_docs GitHub Wiki

Motivation

Background

References

Coverage

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️