StoreResults requests - dmwm/WMCore GitHub Wiki

StoreResults requests are meant to elevate/migrate users datasets from the analysis DBS instance to the global one, in addition to the injection into TMDB and further subscription possible managed by PhEDEx and DDM. It is basically a merge request type, so it runs merge jobs on the input files in order to make them larger. This is more efficient for the data transfer system and allows them to be stored on tape.

However, before an input dataset can be elevated via StoreResults, the input dataset must be migrated from the analysis DBS instance (physXX) to the production global one. Otherwise the final output will fail to be inserted into Global DBS because its parent is not on the same instance. This procedure can be done as following:

source /data/admin/wmagent/env.sh
source /data/srv/wmagent/current/apps/wmagent/etc/profile.d/init.sh
python

from dbs.apis.dbsClient import DbsApi
DSET = '/HighMultiplicity85EOF/Run2016B-23Sep2016-v2/AOD'
DBSURL = 'https://cmsweb.cern.ch/dbs/prod/phys03/DBSReader'
migrateArgs = {'migration_url': DBSURL, 'migration_input': DSET}

dbsApi = DbsApi(url = 'https://cmsweb.cern.ch/dbs/prod/global/DBSMigrate/')
dbsApi.submitMigration(migrateArgs)

Requirements for StoreResults creation

The end user has to provide a small set of information such that the P&R team can properly migrate the input dataset and create a StoreResults request. The required arguments are:

  • CMSSWVersion which was used to produce the user dataset. If it's a deprecated release, then a compatible new release has to be found and used.
  • ScramArch corresponds to the production architecture for the CMSSW release.
  • DbsUrl url where the user dataset can be found and read from (phys03 except for Run1 datasets which are in phys01 and phys02)
  • InputDataset dataset to be elevated to TMDB and global DBS
  • PhysicsGroup physics group that's going to sign off this elevation and (sort of) own this data, though it'll probably be managed by DDM. Moreover, this physics group is also automatically added to the MergedLFNBase. More in the section below.
  • SiteWhitelist where the data is located at (and where the StoreResults job will likely run).

Request creation

The StoreResults spec file can be found here, where it lists which arguments are optional or not, their default value and the data type expected by ReqMgr.

It's worth mentioning though, that

  • ConfigCacheID is not mandatory. Actually, there is no CMSSW PSet configuration, since it runs merge jobs.
  • PhysicsGroup parameter is mandatory and it's automatically added by ReqMgr to the output LFN, e.g.: /store/results/higgs for Higgs group. In addition to that, PhysicsGroup name cannot be longer than 30 chars.
  • DbsUrl can be either the global or the analysics one, since data will be in both after it gets migrated.
  • GlobalTag can be any string, it is not really used in these merge jobs.

Request assignment

Ideally these workflows should be assigned to the location that hosts the data, since these jobs are going to be IO bound and reading data remotely would put a large load on AAA and likely give a larger error rate.

When assigning these requests, make sure to:

  • always use "AcquisitionEra" = "StoreResults"
  • if the site is not used in central production, additional changes might be needed before getting this request through.
  • user dataset names are usually very large, so mind the limitation of ProcessingString, which cannot have more than 100 chars.
  • output dataset can (must?) have a tape subscription, DataOps group. Possible make another Disk subscription to AnalysisOps group. However, these subscriptions have to be discussed within CompOps.

Last but not least, you can find an example StoreResults request (both creation and assignment dictionary) HERE