Workflow scheduling - reanahub/reana GitHub Wiki
This page describes how workflow scheduling is working in REANA platform.
Contents
Introduction
In general, workflows are scheduled in the following way:
- User requests
reana-serverto start a workflow; reana-servercalculates workflow priority and complexity based on configured scheduling strategy;reana-serverpublished a message toworkflow-submissionqueue with workflow details, priority, and complexity;workflow schedulerpicks up the message from the queue and checks if the workflow can be scheduled;- If the workflow can be scheduled,
workflow schedulersends a request toreana-workflow-controllerto start the workflow; - If the workflow cannot be scheduled,
workflow schedulercan either:- publish a message to the
workflow-submissionqueue to try again; - fail a workflow.
- publish a message to the
In the following sections, we will go deeper into the details of each step.
Tip: Architecture page provides a nice overview diagram of the REANA platform that can be helpful when reading this page.
workflow-submission queue
This is a queue that is used to submit workflows to the scheduler.
- publishes to the queue: reana-server
- consumes from the queue: workflow scheduler
Message schema:
{
"$id": "reana/workflow-submission-message.schema.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "workflow-submission message",
"description": "Describes workflow submission message for scheduler",
"type": "object",
"properties": {
"user": {
"description": "The unique UUID identifier for a user",
"type": "string"
},
"workflow_id_or_name": {
"description": "The unique UUID identifier or name for a workflow",
"type": "string"
},
"priority": {
"description": "Priority number of the workflow",
"type": "integer"
},
"min_job_memory": {
"description": "Priority number of the workflow",
"type": "integer"
},
"parameters": {
"type": "object"
},
"retry_count": {
"description": "Number of times the workflow submission was retried",
"type": "integer"
}
},
"required": ["user", "workflow_id_or_name", "priority", "min_job_memory"]
}
This is priority queue.
Messages with higher integer in priority field should be consumed first.
Scheduling strategies
Currently, REANA supports two scheduling strategies:
-
fifo, first-in first-out strategy, starting workflows as they come; -
balanced, a weighted strategy taking into account existing multi-user workloads and the complexity of incoming workflows.
Workflow complexity
Workflow complexity is an internal concept we use in REANA in order to help
decide which workflow to schedule when balanced strategy is used.
It expressed how many jobs the workflow would like to start, and how many memory each individual job would consume.
The workflow complexity value looks symbolically as follows [(4, 4G), (3, 2G)] meaning that when the given workflow starts, it would like to launch 4
jobs of 4 GB RAM each, and 3 jobs of 2GB RAM each.
The workflow complexity numbers for given workflow can be obtained by parsing
the workflow DAG specification and studying how many jobs will be started in
parallel upon launch and how many kubernetes_memory_limit each job asks for.
How to work with workflow complexity outside of REANA cluster?
Despite the fact that the workflow complexity logic belongs to reana-server,
it can be tested without the cluster running by importing the appropriate
functions from a python shell:
$ mkvirtualenv foo
$ pip install ../reana-client ../reana-server ipython
$ cd ../reana-demo-root6-roofit
$ ipython
and then in the Python REPL:
In [1]: from reana_client.utils import load_reana_spec
In [2]: from reana_server.complexity import estimate_complexity
In [3]: reana_yaml = load_reana_spec('./reana.yaml')
==> Verifying REANA specification file... ./reana.yaml
-> SUCCESS: Valid REANA specification file.
==> Verifying REANA specification parameters...
-> SUCCESS: REANA specification parameters appear valid.
==> Verifying workflow parameters and commands...
-> SUCCESS: Workflow parameters and commands appear valid.
==> Verifying dangerous workflow operations...
-> SUCCESS: Workflow operations appear valid.
In [4]: from reana_server import complexity as reana_server_complexity
In [5]: reana_server_complexity.REANA_COMPLEXITY_JOBS_MEMORY_LIMIT = '4Gi'
In [6]: estimate_complexity('serial', reana_yaml)
Out[6]: [(1, 4294967296.0)]