FJR metrics data flow to WMArchive - dmwm/WMCore GitHub Wiki

This wiki describes data flow of FJR metrics to WMArchive.

We will use how to add WMCMSSWSubproces metrics to WMCore FJR which is created through the following set of steps:

  • The WMCMSSWSubproces are defined in WMCore/FwkJobReport/Report.py module. But they also may be declared (if we need their default values) in WMCore/Services/WMArchive/DataMap.py module
  • CMSSW executor reads CMSSW XML framework job report file and creates WMCore FJR object
  • This object later used by JobStateMachine (via JobAccountant component) which performs the following actions
  • It creates FJR JSON document in local CouchDB
  • At each state of job transition it reads this document from CouchDB and updates relevant parts, e.g. step information
  • Finally, JobArchiver component runs ArchiveDataPoller which reads documents from local CouchDB and send it over to WMArchive service.

Here are concrete steps in WMCore codebase code flow which are executed:

WMComponent/JobAccountant/JobAccountantPoller.py calls self.accountantWorker(jobsSlice)
WMComponent/JobAccountant/AccountantWorker.py calls self.stateChanger.propagate(self.listOfJobsToSave, "success", "complete")
WMCore/JobStateMachine/ChangeState.py calls self.recordInCouch(jobs, newstate, oldstate, updatesummary)
WMCore/JobStateMachine/ChangeState.py calls self.fwjrdatabase.commit(callback=discardConflictingDocument)
WMCore/Database/CMSCouch.py performs commit of FJR to local CouchDB

Please note: this process only updates WMCMSSWSubprocess metrics in WMArchive, but it can't be used to propagate WMTiming metrics. The latter are create via submit_py3.py scripts which reads pkl file, add this metrics to it and write back pkl file. Since WMCore does not read pkl report files we are still missing these metrics in local couchDB and in WMArchvie. For that reason this PR removes these metircs from DataMap.py in step section as they do not belong there and should be part of top level JSON. In my view to add these metrics we must read pkl final report in ArchiveDataPoller (from JobArchiver component) and add them to FWJR we send to WMArchive.