StepChain Parentage - dmwm/WMCore GitHub Wiki

StepChain parentage problem:

Since files generated in different step in StepChain can be merged asynchronously it is hard to determine the parentage of between merged files. i.e. First step generates GENSIM files and second step takes that GENSIM and create DIGI, GENSIM files from step 1 will be the input file for merge job and files from DIGI will be a separate merge job. Those merge job happens asynchronously. So there is no direct parentage recorded between 2 merges jobs. To solve this parentage problem

  1. Parentage between 2 dataset has to be defined (recorded in DBS)
  2. We can look at the lumi information from files in these two datasets and determine the parentage relation between files.

Current Procedure to fix the StepChain

  1. When StepChain Workflow is created, dataset parentage is calculated and ParentageResolved flag is set to False.

    https://cmsweb.cern.ch/wmstatsserver/data/filtered_requests?RequestType=StepChain&ParentageResolved=false&mask=ChainParentageMap&mask=ParentageResolved

{"result": [
 {
  "ParentageResolved": false, 
  "ChainParentageMap": [
    {
      "ParentDset": "/GluGluToBulkGravitonToHHTo4C_M-3000_narrow_13TeV-madgraph-pythia8/RunIISummer18DRPremix-101X_upgrade2018_realistic_v7-v3/AODSIM", 
      "ChildDsets": [
        "/GluGluToBulkGravitonToHHTo4C_M-3000_narrow_13TeV-madgraph-pythia8/RunIISummer18MiniAOD-101X_upgrade2018_realistic_v7-v3/MINIAODSIM"
      ]
    }, 
    {
      "ParentDset": "/GluGluToBulkGravitonToHHTo4C_M-3000_narrow_13TeV-madgraph-pythia8/RunIISummer18wmLHEGS-101X_upgrade2018_realistic_v7-v3/GEN-SIM", 
      "ChildDsets": [
        "/GluGluToBulkGravitonToHHTo4C_M-3000_narrow_13TeV-madgraph-pythia8/RunIISummer18DRPremix-101X_upgrade2018_realistic_v7-v3/AODSIM"
      ]
    }, 
    {
      "ParentDset": "/GluGluToBulkGravitonToHHTo4C_M-3000_narrow_13TeV-madgraph-pythia8/RunIISummer18wmLHEGS-101X_upgrade2018_realistic_v7-v3/GEN-SIM", 
      "ChildDsets": []
    }, 
    {
      "ParentDset": null, 
      "ChildDsets": [
        "/GluGluToBulkGravitonToHHTo4C_M-3000_narrow_13TeV-madgraph-pythia8/RunIISummer18wmLHEGS-101X_upgrade2018_realistic_v7-v3/GEN-SIM", 
        "/GluGluToBulkGravitonToHHTo4C_M-3000_narrow_13TeV-madgraph-pythia8/RunIISummer18wmLHEGS-101X_upgrade2018_realistic_v7-v3/LHE"
      ]
    }
  ], 
  "RequestName": "vlimant_task_BTV-RunIISummer18wmLHEGS-00009__v1_T_180905_232233_716"
}]}

  1. Using this information dataset parentage is inserted to DBS in DBS3Upload (Only when the first child dataset is inserted to DBS)

  2. When workflow is announced (which means all the output data is updated in DBS), reqmgr start to find out file parentage between datasets and insert file parentage to dbs. (Only requests with announced and normal-archived status will be handled - This means if workflow is aborted, rejected or failed, file parentage is not set although there might be dataset parentage is set - considering those datasets are invalid.) And ParentageResolved flag is set to True.