Guzzle Scheduler - ja-guzzle/guzzle_docs GitHub Wiki
https://github.com/ja-guzzle/guzzle_common/-/issues/205
- We move the skip_allowed setting at schedule level and entire schedule is skipped or allowed based onwhether previous cycle is complete or not. The behavior remains same whether its configured in parallel or serial - it has to wait for all of them to complete before next cycle starts. The way it tracks if the current cycle is running or not purely using the actual running cycle in spring-boot (it should not use job info or other tables)
- A new setting allow_skip which is only honored for parallel: false which decides whether the serial schedule proceeds to next runnable or not if the current runnable fails. The status of whether the runnable should behave same
User can define multiple schedules in $GUZZLE_HOME/conf/instance/schedules directory
# The file resides in guzzle_home/conf/instance/schedule
# The schedules are refreshed where each schedule file is refreshed as per latest available one and new schedule takes immediately and the runnable will start triggering as per the new schedule.
# While schedule are refrheshed, the existing cycle running for given schedule will continue without any impact
# The record is created for each schdule and the runnable will point to this as parent insance id.
# For the stage it can overwrite the batch id which is natural parent and may impact monitoring UI and any external reports build on Guzzle table
version: 1
schedule:
#type: cron
type: daily
#type: daysofweek
parallel: true ## Irrespective of whether the runnables run in sequence or parallel (if its parallel , all of them run at one shot unless QR prevents it
allow_skip: true ## If a current cycle (irrespective of paralle of serial) is running and the time for next cycle to starts happens, then it whether it should skip or trigger another parallel cycle
partial_run: true ## Only if its seiral , whether to continue if there are failures in previous runnable or fail stop this schedule instance. For Parallel this setting can be skipped. Whether the setting is optional or not to be cheeked?
properties:
#daysofweek: 1,2,3
trigger_time: 17:48
#trigger_every: 1
#cron: "0 */1 * * * *"
parameters:
system: CUSTOMER
param2: value2
#business_date: ${guzzle.scheduler.orderdate} ## Timestamp and can be manpulated using groovy
runnables:
- id: runnable-id1 ## To check the significance of this item?
name: jg1
type: job_group ## This is always at runnable level , change type to new naming covention
environment: test ## This is always at runnable level
spark_environment: local1 ## This is always at runnable level
parameters:
system: ## Empty parameters will be prvented from UI, from backend, it may be allowed
location: SG
#odate: ${guzzle.scheduler.orderdate} ## Timestamp
#sdate: ${guzzle.scheduler.systemdate} ## Timestamp
src_table: table1
tgt_table: table2
spark_properties: ## Only works for yarn and again at runnalbe level
sp1: spv1
Name of the schedule file will be name of the schedule. e.g. create file "daily_customers.yml" for schedule named "daily_customers"
One schedule config file will look like following:
version: 1
schedule:
type: daily/daysofweek/cron
parallel: true
properties:
...
parameters:
system: CUSTOMER
param2: value2
business_date: ${guzzle.scheduler.systemdate}
custom_date: ${guzzle.scheduler.systemdate[0..3] + guzzle.scheduler.systemdate[5..6] + guzzle.scheduler.systemdate[8..9]}
runnables:
- id: 383d9cd8-fc37-4f94-8f32-95cfcc63a62d
name: job1
type: job
environment: dev
spark_environment: local_spark
quantity_resource: qr1
concurrent: false
parameters:
location: SG
param3: ${system}_${location}
spark_properties:
...
- id: 3fec2f9b-b3a0-4f51-b47b-191ed0e0e091
name: job_group1
type: job_group
environment: test
spark_environment: hdp_cluster
quantity_resource: qr2
parameters:
location: IN
param2: new_value2
To schedule daily at 01:00,13:00:
schedule:
type: daily
properties:
trigger_time: 01:00,13:00
To schedule at every 2 hours:
schedule:
type: daily
properties:
trigger_every: 2
To schedule at 01:00 every Monday and Thursday:
schedule:
type: daysofweek
properties:
daysofweek: 1,4
trigger_time: 01:00
To schedule as per custom cron expression at 9:30 am every Monday, Tuesday, Wednesday, Thursday, and Friday:
schedule:
type: cron
properties:
cron: 0 30 9 ? * MON-FRI
$GUZZLE_HOME/conf/schedules/quantity_resource.yml will be config file to define quantity resources:
version: 1
quantity_resources:
qr1: 10
qr2: 5
Few implementation notes-
- Implement scheduler features in guzzle api project
- We can use spring task scheduler to schedule jobs at some frequency. Check https://www.baeldung.com/spring-task-scheduler for more information
- If schedule on the job is updated through UI, it should be updated immediately by comparing in memory references of the schedules using id of the runnable item
- There should be scheduled task that is triggered at regular interval (defined by application.syncSchedule.jobScheduler in application.yml of the api project) that reads schedule files and quantity_resource.yml file and updates in memory references of the schedules using id of the runnable item. There should be api that triggers this schedule immediately (for example check api /api/sync)
- If value of the qr is increased and if there are waiting runnable items in that qr, those runnable items can start immediately if there is sufficient qr capacity.
- If value of the qr is decreased, new runnable items will run according to new qr capacity
If parallel=false at schedule level and all scheduled runnable items doesn't finishes for previous schedule, current schedule trigger will be ignored.
in the call we removed this requirement and concurrent running of same runnable item will depend on just allow_skip flag. As per this we can have following situation
we have job j1 which takes 1:30 hours of time, and j2 which takes 1 hour of time for completion. Now these 2 jobs are configured in one hourly schedule where parallel = false and allow skip is true for both j1 and j2 Now At:
00:00 schedule instance 1 - job j1 starts
01:00 schedule instance 2 - job j1 skips at previous one is still running
01:00 schedule instance 2 - job j2 starts
01:30 schedule instance 1 - job j2 skips as job j2 from schedule instance 2 is still running
Earlier assumption was to avoid this situation. Let us know how should be the behavior ?