Background
Guzzle orchestration capabilities
Proposed Approach
Typical flow:
Scheduling using ADF
Other items
Why we need stages
Calling External jobs in guzzle from job groups
Other pattern - ADF is the master orchestrate
Other Topics
Why context in Guzzle
Job Group and Auto dependency

Background

A typical data integration projects has to deal with following concerns around batch and data integration. Below are the area which we see contending with with what ADF and Guzzle offers.

**Connectivity **(Any to Any data movement) - specifically on-premise to cloud (crucial, and other azure sources like O365)
Orchestration - dependencies management: Do we honor context : both options : dates/loop/catching; parallel runs/throttling; dependency (includes error handing ); partial/resume (sometimes this happen)
**Scheduling **- event based, time triggered : ADF Trigger thru shell pipeline calling guzzle batch/control M / Automic/ Autosys/crontab / windows batch)
Run time /Audit monitoring - We provide monitoring UI to see what is running
Notification-

The ones in bold are candidates for ADF while the rest is Guzzle - assuming we treat as Guzzle managing the orchestration and hence run-time audit and monitoring. Whilst Notification still remains a combine topic as it schedules are run in ADF while the actual meat of what has run remains with Guzzle.

Guzzle orchestration capabilities

Considerable depth has been put in the Guzzle orchestration capabilities
The construct of job group gives lot of flexibility to run jobs in parallel with a certain degree of parallelism, auto generate dependency from SQL or explicitly specified as source and target datasets.
Along with job group, the batches brings more capabilities of business date, context parameter , stages, catch up , dependency management across batches

Proposed Approach

Typical flow:

One premise-> landing -> SRI -> FND -> CALC/USECASE -> Reporitn cahce -> PBIX files (if any)
Custom jobs can be for any of them stages (this can be ADF pipeline to on premise, DB notebooks, shell scripts, java programs)

Scheduling using ADF

We have one master pipeline for a scheduled triggered/main : its Main() in C
It will have few activities to call Guzzle API and simulate sync call via loop and wait for it to finish. It init and trigger guzzle stages. You get "run id" which when you submit so that the same gets used to poll status
Status is FAILED if any of the stages/context/ dates failed; the "run id" can be used to resolve the stages that it ran
It will call Batch Run API of Guzzle and pass the context and stage list (ALL Stages by default) (to be added in Guzzle)
It generates notification on completion for the master pipeline

Other items

Running logs are important - it is Log4j behavior on blob, is it DB (when it simply spits content to a file in blob thru normal notebook)
Onetime setup to include: stages and context columns needs to emphasized in documentation

Why we need stages

One its complex to manage table level dependency ; better to manage dependencies at stages
And you can't do straight thru processing as you will have some FND table and some STG

Calling External jobs in guzzle from job groups

so that all the non guzzle native jobs are handled thru it. We support four end points and UI changes as per them (JDBC for stored prod, shell, ADF pipeline and Databricks (for databricks once are similar to the Local shell end point, no password, nothing, it just runs from the Compute env - and hence the account used by compute is leveraged to get workbooks.
External job should be able to do sync (start/stop/status)
Logs in running log file for this jobs - as much the stub gets. We put links to retrieve further logs from ADF/Databricks if some one wants to know more about it. We don't retrieve them in
Killing of the jobs should be support for all the external job types - using best of the ability we can
Runtime audit : Start/stop/status is captured by guzzle: Assuming guzzle is orchestrating or calling it
There is deeper Audit (We get workunit/monitoring (runtime audits) for ADF similar to ADF sync)
Lineage support shall remain till what guzzle supports in terms of end point (example guzzle does not support lineage for Cloud apps APIs - some one can put them as files in ADLS and link the lineage to that in external jobs)

Other pattern - ADF is the master orchestrate

Its NOT recommend unless projects does not have complexity and wants to use guzzle for selective purpoes
ADF invokes job groups in ADF and manages rest of the jobs in ADF
Handles all what is in #Background

Rethink Batch Design in context of ADF - ja-guzzle/guzzle_docs GitHub Wiki

Table of Contents

Background

Guzzle orchestration capabilities

Proposed Approach

Typical flow:

Scheduling using ADF

Other items

Why we need stages

Calling External jobs in guzzle from job groups

Other pattern - ADF is the master orchestrate

Other Topics

Why context in Guzzle

Job Group and Auto dependency

⚠️ GitHub.com Fallback ⚠️

Rethink Batch Design in context of ADF - ja-guzzle/guzzle_docs GitHub Wiki

Table of Contents

Background

Guzzle orchestration capabilities

Proposed Approach

Typical flow:

Scheduling using ADF

Other items

Why we need stages

Calling External jobs in guzzle from job groups

Other pattern - ADF is the master orchestrate

Other Topics

Why context in Guzzle

Job Group and Auto dependency

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️