CMSWEB Services pre production validation - dmwm/WMCore GitHub Wiki

0. Create your working dir:

vcamp='Apr2023'
WMCVersion=2.2.1
mkdir validation_WMCv$WMCVersion
cd validation_WMCv$WMCVersion
wget https://raw.githubusercontent.com/dmwm/WMCore/master/test/data/ReqMgr/inject-test-wfs.py

NOTE: Take the latest release deployed in CMSWeb TestBed from here: HG20** Validation Results

NOTE: In case you are about to use your private setup for the validation it is a must to have the latest version of central services and agent deployed in your private setup. Here are some very important notes for things to watch about: https://github.com/dmwm/WMCore/issues/10659#issuecomment-918547140

1. Inject DMWM + Integration

export X509_USER_PROXY=`pwd`/x509up_u$UID
voms-proxy-init --voms cms -rfc -valid 192:00
ls -la $X509_USER_PROXY 
  • create and execute the following script:
agent='vocms0193'
centServ='cmsweb-testbed.cern.ch'
cat > injection_${vcamp}_tivanov.sh << EOF
python3 inject-test-wfs.py -u https://$centServ -c ${vcamp}_Val -r ${vcamp}_Val -t testbed-${agent} -a DMWM_Test -p ${vcamp}_Val_Todor_v1 -m DMWM |tee -a  injection_${vcamp}_tivanov_dmwm.log
python3 inject-test-wfs.py -u https://$centServ -c ${vcamp}_Val -r ${vcamp}_Val -t testbed-${agent} -a DMWM_Test -p ${vcamp}_Val_Todor_v1 -m Integration |tee -a  injection_${vcamp}_tivanov_int.log
EOF

. ./injection_${vcamp}_tivanov.sh

NOTE: Mind the agent where you are about to run them and the tag/patch installed there. In case there have been a recent Agent deployment campaign make sure that the one where you are submitting to is up to date!

2. Once completed run the validation script:

  • Create the list of validation workflows:
grep 'Create request' injection_${vcamp}_tivanov_dmwm.log |awk  '{print$5}' | sed "s/'//g" | sort | tee validation_${vcamp}_tivanov_dmwm.list
grep 'Create request' injection_${vcamp}_tivanov_int.log |awk  '{print$5}' | sed "s/'//g" | sort | tee validation_${vcamp}_tivanov_int.list

# merge them:
cat validation_${vcamp}_tivanov_{dmwm,int}.list > validation_${vcamp}_tivanov_all.list
  • Run the Validation script:
curl https://raw.githubusercontent.com/dmwm/WMCore/master/test/data/ReqMgr/validate-test-wfs.py > validate-test-wfs.py
python3 validate-test-wfs.py -r $centServ -i validation_${vcamp}_tivanov_dmwm.list -v |tee -a validation_${vcamp}_tivanov_dmwm.val
python3 validate-test-wfs.py -r $centServ -i validation_${vcamp}_tivanov_int.list -v |tee -a validation_${vcamp}_tivanov_int.val

# or:
python3 validate-test-wfs.py -r $centServ -i validation_${vcamp}_tivanov_all.list -v |tee -a validation_${vcamp}_tivanov_all.val

3. Fill in the twikies:

  • Release

  • Agent Version/Status:

    In case there was a patch/deployment of an agent (redeployment must happen if there have been a patch applied in the production agents, so we test under the same condition here, or in case there is a new tag cut in the agent branch)

  • Validation:

cat validation_${vcamp}_tivanov_dmwm.val
cat validation_${vcamp}_tivanov_int.val

# hint: Use the CHANGES file from the latest tag to create the new changes lists

# hint: Watch the comments - they clearly tells

# hint: Keep an eye on what to look for from the validation twiki:

  • important - workflow state transitions.

4. Create ACDC for at least one:

  • TaskChain (TC)
  • StepChain (SC)
  • MergeJob for both SC && TC
  • ReReco
  • NOT for Harvesting

# note: For ACDC workflows the workflow status must be transitioned by hand from ReqMgr, unlike the parent validation workflow, where the status is managed by the injection script.

# note: The following flags are not inherited from the parent workflow and they default to 'false' (looking into the json document for that workflow in ReqMgr):

"TrustPUSitelists": false
"TrustSitelists": false

# NOTE: The name of the parent workflow can be found under the following key in the same json document as above (ReqMgr):

"InitialTaskPath": "/tivanov_SC_Straight_HG1912_Val_191204_034219_8992/L1T_PhaseIITDRSpring19GS_00017_0"

# NOTE: Once the ACDC completes, the validation script should be run again not on the ACDC workflows but on the parent workflow.

# NOTE: Here is one useful condor_q command:

condor_history -since "JobStartDate <= $(($(date +%s) - 5*24*60*60))" -const 'WMAgent_RequestName == "tivanov_ReReco_Parents_HG2005_Val_200428_195622_6876"' -af:h clusterid RequestCpus RequestMemory RemoteWallClockTime DESIRED_CMSPileups WMAgent_SubTaskName  |less -S -I

5. Repeat (3.) if needed.