Job task type characterization based on cmsDriver command line arguments - dmwm/WMCore GitHub Wiki

Introduction

Back in 2017, a set of physics task types was defined in order to be tracked by our condor monitoring tools. The categories were:

physics task types: ["GENSIM", "GEN", "DIGI", "RECO", "DIGIRECO", "MINIAOD"]

and defined at the monit level in the Github Monit Commit.

and the current code can be seen in:

https://github.com/dmwm/cms-htcondor-es/blob/3aac5110fe4a195b23e704d6ce5ddbdf5ec2b30a/src/htcondor_es/convert_to_json.py#L1019-L1059

The logic above is based on theWMAgent_RequestName and WMAgent_SubTaskName classads attributes. This logic has unfortunately not aged well and we have found a series of issues:

  • Misidentification of types: Since this is a string parsing based logic, when strings are updated, the logic gets outdated.
  • New types not recognized: New types like NANOAOD has not been added to this logic
  • No stepChain support: The workflow system is able to run many physics steps within the same job in stepChain workflows. Hence, a single job could be "GEN,SIM,DIGI,RECO,MINIAAOD,NANOAOD", but this is not supported by the logic above.

Hence, the motivation to use a different source to characterize these physics task types and the root source of the physics types produced lies within the cmsDriver command line arguments.

The following wiki documents how to characterize a physics job task type, based on the cmsDriver command line arguments used to produce such production/processing job.

cmsDriver command line arguments structure

With each cmssw request, a configFile with the pset is uploaded to the ReqMgr2 service.

For example: https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/b5800d7a387fa74d27c957e2ddbd4853/configFile

This config file has some meta data included in comments like the following:

# Auto generated configuration file
# using: 
# Revision: 1.19 
# Source: /local/reps/CMSSW/CMSSW/Configuration/Applications/python/ConfigBuilder.py,v 
# with command line options: --python_filename PPS-Run3Winter22SIM-00018_1_cfg.py --eventcontent RAWSIM --customise Configuration/DataProcessing/Utils.addMonitoring --datatier GEN-SIM --fileout file:PPS-Run3Winter22SIM-00018.root --conditions 122X_mcRun3_2021_realistic_v9 --beamspot Run3RoundOptics25ns13TeVLowSigmaZ --step SIM --geometry DB:Extended --filein file:PPS-Run3Winter22pLHEGEN-00004.root --era Run3 --no_exec --mc -n 1511
import FWCore.ParameterSet.Config as cms

Characterization of physics task types based on the cmsDriver arguments

We can therefore get the command line options above by reading the first few lines in each configFile and use these parameters to characterize the physics task type.

After testing the different parameters, the basic classification and mapping can be done as followed:

  • step argument -> physics type name
  • GEN -> GEN
  • SIM -> SIM
  • DIGI -> DIGI
  • RECO -> RECO
  • PAT -> MINIAOD
  • NANO -> NANOAOD

Meaning, if we find: --step GEN,somethingelse,SIM,HLT , then the step produced would be simply: GEN,SIM.

In addition to that, we can characterize if there is pileup configurations in the DIGI step. This is based in the comments from: https://github.com/cms-sw/cmssw/issues/42587#issuecomment-1697977062

Where we split the DIGI step in 3 sub-categories:

  • DIGI_no_pileup: There is no --pileup, --pileup_input arguments, or DATAMIX in the --step arguments

  • DIGI_premixing: It has DATAMIX in the --step argument and also a --datamix argument. Also, there is a --pileup_input argument

  • DIGI_classical_mixing: There is --datamix or DATAMIX in the --step argument, but it has --pileup and --pileup_input in the arguments.

In addition to that, --data and --mc arguments define whether this is data or MC. If not present, it is MC by default.

Note that for StepChain workflows, we would read all the configFiles associated to the job and join all the relevant physics types, thus providing a comma separated string with all the physics types.

Related issues: 11711

List of physics types supported

These physics types are supported in WMCore through a condor job classad called: CMS_extendedJobType.

The following list shows the currently supported physics types that we report with the jobs:

Physics types:

  • GEN
  • SIM
  • DIGI_nopileup
  • DIGI_premix
  • DIGI_classicalmix
  • RECO
  • MINIAOD
  • NANOAOD
  • UNKNOWN

If a specific Pset does not contain any of the relevant physics steps, then it is tagged as UNKNOWN. Keep in mind that taskChain workflows will address one Pset configuration per job, while stepChain can handle several Psets configs per job. Hence, a combination of the physics types above is possible. E.g.: For taskChain, jobs with: GEN,SIM would be a common combination for a single pset configuration.

For stepChain jobs, one could have several steps per job, each with a Pset configuration, which will be joined with commas in the same classad. These are recorded in the configuration with an attribute called stepPhysicsType and in the end, the CMS_extendedJobType classad joins all the stepPhysicsType attributes from the step with commas. For example:

A stepChain workflow job with 3 steps where:

  • step1.stepPhysicsType = GEN,SIM
  • step2.stepPhysicsType = DIGI_nopileup
  • step3.stepPhysicsType = RECO,MINIAOD,NANOAOD

would create the following condor job ad CMS_extendedJobType = "GEN,SIM,DIGI_nopileup,RECO,MINIAOD,NANOAOD".

Jobs executing non-physics tasks (e.g.: Merge, LogCollect, etc), will just show CMS_extendedJobType = undefined.