Workflow creation and assignment definition - dmwm/WMCore GitHub Wiki

As of HG1705 CMSWEB cycle, we've tighten the request spec validation for both creation and assignment. In summary they are:

  • there is a specific set of parameters that must be provided during request creation (optional=False). There are also optional parameters that have a default value in ReqMgr;
  • there is a specific set of parameters that must be provided during request assignment (assign_optional=False). There are also optional parameters that have a default value in ReqMgr;
  • unknown arguments (not defined in the spec files) are no longer accepted/removed and the request will fail validation. Moreover, arguments are case-sensitive;
  • there are a few arguments that are allowed both at creation and at assignment. Besides those, you must provide only creation or only assignment parameters, according to the operation you want to accomplish.
  • inner Task/Step dictionary (for TaskChain/StepChain) are also validated and they have their own arguments definition.

Request type dependencies

The definition of these arguments follow a hierarchy where StdBase contains the base definition for all workflows. See 'getWorkloadCreateArgs' for creation global arguments, 'getWorkloadAssignArgs' for assignment arguments and 'getChainCreateArgs' for either Task and Step argument definition. All the other request types inherit from StdBase and they can override the base/global arguments definition with specific arguments or data type (as well as adding new arguments), see each of them according to the request type you want to create. In addition to that, data processing requests (requests that always have an input dataset) inherit the arguments definition from DataProcessing, which also has StdBase as a super class. This flowchart should make these dependencies clear:

Dependency of requests

So, for instance, if one wants to create a ReReco request, you need to:

  1. get the StdBase create args
  2. update them with the DataProcessing create args
  3. and finally update those with the ReReco create args. this gives you the final list of arguments allowed, their default values, their expected data type, their validate function and so so.

Arguments dictionary specification

We've built a json file for each of the supported request types, where one can find the final list of supported arguments and their definition. A few remarks must be made though:

  • The WMSpec files contain the authoritative argument definition. These json templates will be kept up-to-date in the best effort.
  • Some spec files contain floating arguments (arguments with an integer suffix that starts in 1 and go up to X), e.g. for TaskChain, StepChain and ReReco. More on their sections.
  • Several example request templates (taken from real configuration/workflows) can be found here EXAMPLES

For clarity purposes, we've kept only the most important attributes for each argument. Their meaning is:

  • default: in case the argument is not provided, it gets the "default" value assigned to it;
  • optional: if optional is set to True, then it means the requestor doesn't need to provide it. Otherwise (False), it's mandatory and request creation will fail if requestor don't send it in the dictionary.
  • type: the data type expected for the value of that argument. WMCore defines a few additional data types, example "strToBool" which maps either a boolean/string to boolean; "makeNonEmptyList" which maps a string or list to a list object; etc. Most of these arguments have a validation function to make sure their value complies with some basic rules.

Creating a ReReco request

ReReco requests are also data processing (processing input data), so it relies on StdBase + DataProcessing + ReReco create args. The final dictionary of supported arguments can be found here: ReReco_createSpec.json

Note, however, that ReReco requests can have the "floating" arguments defining a Skim task. Since a skim task is not mandatory, it means a ReReco request does not need to have any of those arguments, or it can have N of those arguments depending on how many skim tasks one wants to have running. These arguments are better seen in the code itself getSkimArguments(), where one can see that only 3 arguments are mandatory (IF a Skim task is defined in the request). The '#N' is meant to be replaced by an integer. A real example can be seen in this request example ReRecoSkim.json.

Creating a DQMHarvest request

DQMHarvest requests are considered data processing (processing input data), so it relies on StdBase + DataProcessing + DQMHarvest create args. The final dictionary of supported arguments can be found here: DQMHarvest_createSpec.json

Creating a StoreResults request

StoreResults are meant to be created only by CompOps (Production & Reprocessing team). Besides being a data processing request, StoreResults still relies on StdBase + StoreResults create args only (it needs to be updated!!!). Anyways, the final dictionary of supported arguments can be found here: StoreResults_createSpec.json

Creating a StepChain request

StepChain can either be a MonteCarlo or a data processing request (well, usually it is both altogether). Since we can chain an indefinite number of Steps into a StepChain request, its creation is a bit more tricky and one needs to consider both create arguments AND inner step create arguments definition. It relies on StdBase + StepChain argument definition.

The final top level (global) dictionary of supported arguments can be found here: StepChain_createSpec.json. Bear in mind that this request type also contains the floating arguments (Step1, Step2, Step3,...), you'll find only Step1 in this json, but more (ending with a sequential integer number) steps can be added to your request.

The argument definition for a chain (Step) dictionary can have different mandatory arguments, depending on:

  1. whether it's a first step (Step1) or a generator step (no input dataset), see this argument specification StepChain_create_Step_GeneratorSpec.json
  2. or whether it's a subsequent step (Step2 or further) or a processing step (with input dataset), see this argument specification StepChain_create_Step_ProcessingSpec.json

Creating a TaskChain request

TaskChain can either be a MonteCarlo or a data processing request (well, usually it is both altogether). Since we can chain an indefinite number of Tasks into a TaskChain request, its creation is a bit more tricky and one needs to consider both create arguments AND inner task create arguments definition. It relies on StdBase + TaskChain argument definition.

The final top level (global) dictionary of supported arguments can be found here: TaskChain_createSpec.json. Bear in mind that this request type also contains the floating arguments (Task1, Task2, Task3,...), you'll find only Task1 in this json, but more (ending with a sequential integer number) tasks can be added to your request.

The argument definition for a chain (Task) dictionary can have different mandatory arguments, depending on:

  1. whether it's a first task (Task1) or a generator task (no input dataset), see this argument specification TaskChain_create_Task_GeneratorSpec.json
  2. or whether it's a subsequent task (Task2 or further) or a processing task (with input dataset), see this argument specification TaskChain_create_Task_ProcessingSpec.json

In case you want to see where it's in the code, the task definition follows StdBase.getChainCreateArgs() + TaskChain.getChainCreateArgs(), where the TaskChain definition can add or override any argument from StdBase.

Creating a Resubmission request

Resubmission are meant to be created only by CompOps (Production & Reprocessing team). Resubmission is a joker, meaning it can be any of these requests above. It has its own argument definition on top of the original request type that's being "resubmitted". These additional arguments can be found here Resubmission. getWorkloadCreateArgs()