List of PSet tweaks applied by WMAgent during job runtime - dmwm/WMCore GitHub Wiki

Overview

Pset tweaking is tightly coupled in WMCore, WMControl and CMSSW, and it brings complications that affect the support of multiple python versions (in CMSSW and WMCore) and job configuration (Pset) tweaks applied during runtime.

This topic is meant to be an investigation of what exactly is tweaked by WMAgent - during the job runtime; what the source of the information is (things from the workflow description? things from the WMAgent job description? things based on the site and/or storage? things related to the data to be accessed? etc)

PSet tweaking:

The following scripts are the main ones used in the PSet tweaking:

https://github.com/dmwm/WMCore/blob/master/src/python/PSetTweaks/PSetTweak.py https://github.com/dmwm/WMCore/blob/master/src/python/PSetTweaks/WMTweak.py

The standalone script in development to replace the tweaking part can be seen below: https://gist.github.com/davidlange6/3b0cb365aac669a714d9f288b0bf420d

Job runtime details

  1. WMAgent executes the following script per job: https://github.com/dmwm/WMCore/blob/master/etc/submit.sh

  2. There, Job object is unpacked via Unpacker.py runtime script

  • This creates job directory, unpacks the sandbox and setups environment so job can be called using the Startup runtime script
  • Sandboxes are built via Taskmaker, which calls SandBoxCreator
  1. SandBoxCreator invokes the CMSSWFetcher plugin, which will fetch the config files and Pset tweaks for the step.
  • PSetTweaks are gotten from configCache as a JSON
  • This line reads a json dictionary file and creates a PSetTweak object, which is saved as another JSON file.
  1. Startup.py bootstraps the job and executes it
  1. For a CMSSW executor, prescripts are added here and looked here, being SetupCMSSWPset the only one added by this method. This is where the Pset tweaking is called.

More detailed information on the Scram environment and the environment used while running the CMSSW executable:

https://github.com/dmwm/WMCore/wiki/Notes-about-environment-variables-passed-to-the-Scram-environment-or-modified-when-running-the-CMSSW-executable

SetupCMSSWPset prescript

  • Each CMSSW step (cmsRun1, cmsRun2, etc) will call SetupCMSSWPset once The command looks like this (note SetupCMSSWPset is called via ScriptInvoke)
2020-07-15 00:03:37,065:INFO:Scram:    Invoking command: /cvmfs/cms.cern.ch/slc6_amd64_gcc491/cms/cmssw/CMSSW_7_4_0/external/slc6_amd64_gcc491/bin/python2.7 -m WMCore.WMRuntime.ScriptInvoke WMTaskSpace.cmsRun1 SetupCMSSWPset

  • Note the command above calls SetupCMSSWPset via ScriptInvoke, which will load the job step object. This what we would need to change in WMCore

The WMTweak.applyTweak needs the following input parameters:

  • Self.process, which is the stepSection object
  • A PSetTweak object
  • A fixupDict dictionary, to make sure Psets and configuration values exist.

Where the pset tweaks are defined at 4 different stages:

  1. Task pset tweaks Input: StepSection of FwjReport object

    • Parameters modified:
      - process.GlobalTag.globaltag
      - process.GlobalTag.DBParameters.transactionId
      
  2. Job Pset tweaks

    • Input: WMBS Job object
    • Parameters modified:
      process.source.firstLuminosityBlock
      process.source.fileNames
      process.source.secondaryFileNames
      process.source.firstEvent
      process.maxEvents.input
      process.source.skipEvents
      process.source.lumisToProcess
      
  3. Output module Pset tweaks

    • Input parameters:
      • Module output, gotten from here
      • WMBS Job object
    • Parameters modified:
    process.<outMod>.fileName
    process.<outMod>.logicalFileName
    
    • Where outMod is the output module, e.g.: .input.outputModule = 'AODSIMoutput'
  4. Random generator tweak

    • Input parameters: None (but uses getBaggage() method from WMBS job object)
    • Parameters modified:
    process.RandomNumberGeneratorService.<randService>.initialSeed
    
    • Where randService comes from: RandomNumberGeneratorService.x_internal_name

Generation of Pset tweaks

Refer to the following presentation for details about the generation of these tweaks and how they are fetched in WMConfigCache:

https://indico.cern.ch/event/946436/contributions/3977022/attachments/2087948/3507926/Upload%20to%20ReqMgr2%202020%2008%2014.pdf

PSet tweak examples and job environment to reproduce the whole pset tweak stages procedure

The following repository:

https://gitlab.cern.ch/khurtado/psettweakmigration/

provides the following:

  1. A set of PSet tweaks and input files (inside the test directory) for the following stages:
  • Task level, Job level, Output module level

It also provides the output files as gotten by the current Pset tweaks methods. They can be used with a standalone script replacement, as long as python includes the CMSSW release modules (details in the README file in the repository)

  1. A job sandbox environment

This allows the user to run a production job, generate and apply the Pset tweaks interactively. It can be used to modify SetupCMSSWPset with a different applyTweak method, for example.

Note files will be generated for cmsRun1 to cmsRun6, but jobTweaks will only be applied to cmsRun1, as the subsequent ones are treated via chain Processing instead.

Please, read the README file for more details on how to use these files.

Parameters modified outside PSet tweaks

When chainedProcessing is enabled, a chain processing modification is performed rather than the job level tweaking.

After that, pileup is checked:

The following parameters are also modified directly in the Step object (self.process) without creating/applying a PSet Tweak object and are applied after the Output module and Random seeding stages

Finally, there is some logic that modifes some other parameters in the configuration (skipBadFiles, eventsPerLumi, maxSecondsUntilRampdown, overrideCatalog).

Notes on the code

It seems psetTweak is always None in the configuration. It's never set. However, with the way CMSSWFetcher writes the configuration into the sandbox, a "None" filename is used to write the configuration. E.g.:

https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/WMSpec/Steps/Fetchers/CMSSWFetcher.py#L60 ./WMSandbox/TOP-RunIISummer19UL16wmLHEGEN-00075_0/cmsRun1/None

Nonetheless, the following code in SetupCMSSWStep:

evaluates PsetTweak to None properly, so the method applyPSetTweak is never applied. With that said, if PSetTweak was not None, we don't have the applyPSetTweak method defined anywehere, so this would fail.

In summary, I'm not sure what this lines of code are for:

https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/WMRuntime/Scripts/SetupCMSSWPset.py#L770-L772

but it's probably legacy and can be taken out. The tweaks seem to be created not from the config gotten from CacheConfig directly, but rather from the WMWorkload WMBS Job objects. To investigate further.

Issue: https://github.com/dmwm/WMCore/issues/9878

Methods in SetupCMSSW that require SCRAM

Optional? The following methods don't really need to be moved out of the WMCore SetupCMSSWPset script, but doing so would allow to avoid keeping track of specific CMSSW version releases for support of X or Y functionality. We would need to implement a way to pass some parameters from e.g.: the jobBag object though.

Methods in SetupCMSSW that require reading from the Pset (besides the one above)

Initialize some fields with default values. Reads Pset to see if fields exist and add values to subfields.

Although, maybe we could make the dictionary anyway, and have some option that if the field/subfield exists with some value, then do not touch.

All the optionals from above too:

Update (11-13-20)

  • Need to test createProcess, handleCERNMergeSettings
  • Will update interactive suite with more example use-cases (e.g.: chainSteps)
  • TODO: Replace TweakMaker with tweak_maker_lite and make sure they pass through the WM testbed properly.
  • TODO: After changes are committed, test python 3
⚠️ **GitHub.com Fallback** ⚠️