GIT Integration - ja-guzzle/guzzle_docs GitHub Wiki
Table of Contents
- Background
- Data-ops in Action
- Versioning
- Integrating Guzzle with Github
- Registering the OAuth application in Github
- Creating the repository
- Integrating Git in Guzzle
- Versioning in Guzzle
- Overview
- Git actions from Guzzle
- How to use Gitflow in Guzzle.
- Overview
- How to work with Gitflow in Guzzle and Github
- Create release package from Github
- Conclusion
Background
Data-ops in Action
Versioning
- Versioning of job definition is the most important element of Innovation pipeline
- Guzzle builds in a native support for Git to enable wide range of git-flows which organization may desire
- Considering all the job definitions in Guzzle are captured as plain transparent yml files, this can easily be versioned, reviewed and merged if there are concurrent development
- Like application software development even for data engineering job configs, Git integration should only be ENABLED in the non-production environment. The purpose is to let developer make the config in separate feature or release branches , test them and push to final master branch and subsequent environment like production.
Integrating Guzzle with Github
Integrating a git repository for your Guzzle installation will entail following:
Registering the OAuth application in Github
- For every Guzzle instance (or Guzzle installation) you will be required to register OAUth application in Github
- Enter all the required details namely:
Application name: guzzlemp4-dev
Home Page: https://guzzlemp4.southeastasia.cloudapp.azure.com:8082/
Redirect URL: https://guzzlemp4.southeastasia.cloudapp.azure.com:8082/integration/git
- Once Application is registered, Github shall provide you the details of Client ID and Client Secret
Note: The OAuth application can be created under the individual or organization
Creating the repository
- You can create a repository in Github or use existing repository.
- The repository can be created under individual's account or organization
- Guzzle supports both private and public repositories
- To create repository, login to Github account, go to Repositories and click New. Ensure the repository is initialized.
- Also ensure that user account that shall be used to integrate git with Guzzle should have "Name" defined in public profile:
**Note: ** This repository can be either in the same user or organization as the OAuth Application or not. Both OAuth and repository are independent component of the setup
Integrating Git in Guzzle
- Login to Guzzle using userid which has Admin role and go to Git settings
- Enter the Client ID, Client Secret and Redirect URI and click on "Enable Git"
- This will redirect you to login to Github and authorize the OAuth application: guzzlemp4-dev to be able to access the public and private repositories in your account (personal repository) or those owned by your organization. Permit this access
- After OAuth flow (authorization), you will be directed to Guzzle. Enter the repository owner (this could be individual's account or organization) and tab-out. This will refresh list of repositories that are accessible under that owner
- After you select repository, Guzzle will prompt to select the Collaboration branches to be used. Once done click on Enable GIT
- Once enabled you will notice in Job Config screen, Guzzle shall show Git setting option on Top Right
- Also Guzzle will commit all the existing configs into the github repository and you should see following folders showing up in the repo. Do take note that Guzzle will only clone the files in $GUZZLE_HOME/confg/default configs (which does not include spark and phsyical end points):
- When working on Guzzle once GIT is enabled, user will will be able to view the configs in Default mode but all edits will be enforced via Git mode. This enforces all configs get versioned. When in Git mode every config changes that is done in Guzzle will end up getting commited to git repo cloned locally (a local clone is done on guzzle vm for every user session) and then pushed to server (Github)
Versioning in Guzzle
Overview
- Every user that is logging in to Guzzle will use his/her personal github account to enable Guzzle to work on the git repo configured in guzzle env.
- The user will have to have access minimum contributor permission to the repository that is configured in Guzzle's GIT setting to be able to save and push the changes .
- User will be asked to Authorize OAuth app defined above to access the repositories which user has access to. All the configs saved by this guzzle user will be comitted and pushed to githug using the github account account that will be used in authorization flow
**Note: ** As part of Authorization flow, access token is generated and stored in guzzle repo for the user.
Git actions from Guzzle
- All the changes done by user will Saved into Guzzle repo. The commits are tracked using users' own user id
- User can always create feature branches from within the Guzzle and make the changes in these branches
- User can also request pull request from guzzle to merge the feature branches to master or collaboration branch configured in guzzle
This brings users to github allows user to review the commits and submit pull request
- User can also pull the latest repository before making any changes to ensure he/she can see the latest changes which are present on github
- If there are conflicts (in cases where multiple users are working on same branch), Guzzle shall prompt conflict dialog if the two user are making conflicting changes on same job yml
How to use Gitflow in Guzzle.
Overview
- Gitflow is widely adopted workflow for collaborative development of large software, code view and managing the branches. Gitflow is described here. Its full proof and should be applicable for most project of any size
- The data projects can simply use this as-is with exception of not using release branches (explained below)
- master branch is only updated through pull request (instead of direct commits)
- Team lead has access to merge into develop branch after code review or is allowed to commit into develop branch
- master branch is updated upon deployment of enhancements or any hot-fix are in production (in future such deployments may happen by CD pipeline and the commit is done via them)
- All the changes done optionally in feature branch created from “develop” (for enhancements) ; or directly in develop branch. Hotfix branch created on need basis to make production fixes. Recommended to use feature branches if possible.
- develop and master are long lived branches. hotfix, and feature are short-lived. Release branches are required if the testing cycle are long and "develop" branch has to move to next major release
- In most projects there will not be design and build of parallel releases, the testing (SIT/UAT) fixes will be done directly in develop branch for current release till the release goes to production
- The develop branch shall be tagged for every release into to QA env for (SIT and eventually for QA and Prod – once respective testing is successful)
- The hot-fix will use the same CD pipeline as main release with assumption that there is NO ongoing release being tested in SIT/QA
- The structure of git repo will mirror the target directory structure in Guzzle VM to keep the deployment simple as unzipping the tar-gz
How to work with Gitflow in Guzzle and Github
-
Make "develop" as the collaboration branch in Guzzle's GIT setup
-
Restrict the master and develop branches so that restricted people can push to it (in Github)
-
master branch should be configured to enforce pull request review before merging. Same can be done for develop branch if we want to enforce feature branches (in Github)
-
The feature branches on develop can be created directly from guzzle once "develop" is setup as collaboration branch in Guzzle
-
hotfix branches can be crated from master branches again from guzzle. Ensure current branch is "master". This is typically done when we want to do hotfix on production code which is what master branch is carrying (remember that master branch always represent's production code- not current release)
-
Pull request can be raised for any branches (by activating that branch and clicking on Setting ) from Guzzle. The target branch to merge to can is always develop which can be changed in github ui:
- feature->develop: for merging the changes from feature branch to main release branch which is "develop"
- hotfix->master (to merge the changes of hotfix to master once the code is deployed)
- hotfix->develop (to merge the code from hotfix to current deveop branch so that the fix is part of current release codeline)
- develop->master : upon releasing the release/enhancement which is being tested and build develop to production
-
Any branches can be deleted from guzzle - specially the short-lived branches like hotfix and feature. Make sure the branch to be deleted is not acive (example in below we have made develop as active, but it can be any branch other than hotfix2 which is to be deleted). Wait for few second for Guzzle to show notification that branch is deleted
Create release package from Github
- Release package can be created by using Github to download delta config changes between current master branch and develop branch in
- The zip folder can then be promoted to production (or next higher environment) and unzipped inside $GUZZLE_HOME/conf/default
- While the migration steps above will move any datastore defined in non-prod to prod, the connection information of same will have to be configured manually in production environment.
- Like connection information as stated in above point, also computes configurations in guzzle are assumed to independent between prod and non-prod and and hence they have to be defined manually
Conclusion
- Guzzle native integration with Github allows developer to seamlessly version the config as they make the changes from Guzzle UI
- Ability to work with specific branches created wither in github or within guzzle and able to submit pull request makes it easy to achieve git-workflows which organization make desire to