Location Flow - dmwm/WMCore GitHub Wiki

Overview: How is the location of data resolved?

This quick reference page describes how the system determines the location of data during its lifetime in the WorkQueue and WMAgent, and where this location is used. This assumes basic knowledge from the reader about WorkQueue elements and WMBS.

This can be divided by 2 parts

  • Location for input:

    • Input location is used to submit jobs for resource matching.
    • This depends on the Glide in factory setting (CMS Site name)
    • WMAgent convert the location info from PhEDEx (PNN) and DBS (PNN - only for local dbs cases) to CMS site name (hacky way and using siteDB api)
    • For resource control,
      • SSB/SiteDB is used for threshold setting (CMS site name)
      • Site whitelist and black list (user input - should be CMS site name but we don't validate- so sometimes user uses PhEDEx node name)
  • Location for output:

    • When Jobs are finished output location is recorded (from fwjr - PNN is returned)
    • DBS: origin_site_name for block - from fwjr
    • PhEDEx injection: PNN
    • PhEDEx subscription: PNN - from user input (custodial and non custodial site from spec)

Dataset locations (set by user input when request is created)

SiteWhiteList, and SiteBlackList, This is not actually specifying the location of the dataset but used as filter since the block location is only contains CMS Site name, it doesn't make difference to set this with PhEDEx node name (which currently not prevent from doing that). But Datamapping will remove the post fix of the node name (i.e. _MSS, _Buffer, _Disk)

Block locations (set by PhEDEx and dbs api from WQ)

Except for MonteCarlo production, the WorkQueue talks in blocks. In order to acquire WorkQueue elements the local queues must compare the location of the input data with the available resources and the site whitelist/blacklist. The question this section answers is: Where does the location of the input data stored in the WorkQueue element come from?

For illustration, here is a reduced version of a WorkQueue element:

{
   "_id": "0005a7d235dddc662075fe14abc87a62",
   "_rev": "4-01d999e040ded04caa1477dec4244f62",
   "updatetime": 1369308010.26,
   "WMCore.WorkQueue.DataStructs.WorkQueueElement.WorkQueueElement": {
       "ChildQueueUrl": "http://localqueue.cern.ch:5984/workqueue",
       "TaskName": "MonteCarloFromGEN",
       "Status": "Running",
       "Inputs": {
           "/SomeParticle-8TeV/Run2015-Example-v1/RECO#3463ffa4-a4f4-11e2-991b-00155dff5493": 
           [ "T1_IT_CNAF", "T0_CH_CERN", "T2_CH_CERN"]
       },
       "ParentQueueUrl": "https://globalqueue.cern.ch/couchdb/workqueue",
       "SiteWhitelist": ["T1_IT_CNAF"],
       "Dbs": "http://cmsdbsprod.cern.ch/cms_dbs_prod_global/servlet/DBSServlet",
   },
   "timestamp": 1369275183.48,
   "type": "WMCore.WorkQueue.DataStructs.WorkQueueElement.WorkQueueElement"
}

The input block in this element has already a list of locations populated. This list of locations may come from 2 different places:

  • When an element is created the locations are obtained as follows (as initial wq splitting):
    1. Query DBS3Reader for the Storage Elements (SEs) where the block is present. (The call actually made to PhEDEx but getting se names instead of node name, if the block doesn't exist in PhEDEx check DBS for original_site_name which is stored as se). (Previous version uses dls which does the same thing) (Happens WQ policy code)
    2. The list of SEs is translated to a list of CMS sites using the SiteDB v2 API.
    3. This final list of CMS names is stored in the WorkQueue element.
  • While a WorkQueue element has not been acquired, i.e. it's in Available status, the DataLocationMapper thread will check if the locations for each available block have changed and update the WorkQueue elements if necessary. The DataLocationMapper update happens as follows:
    1. If the Dbs Url in the WorkQueue element is a global DBS then the locations will be updated from PhEDEx, otherwise they will be updated from DBS using the same process as above. The next steps explain the update from PhEDEx.
    2. The locations are obtained from PhEDEx by checking the subscriptions or the replica info for the blocks, the first method is used in the global queue and the second in the local queues.
    3. The PhEDEx nodes provided in the previous step are translated to CMS names using the SiteDB service module. _Developer Note: Currently the method doing this conversion doesn't communicate to SiteDB but rather does a hacky removal of leading _Buffer or _MSS of Disk. This needs to be changed to properly convert using SiteDB v2.
    4. Finally, these CMS Names are stored in the WorkQueue element if there is any difference with the list of sites already stored in the element.

_Developer Note: This is inconsistent, DBS is not a reliable source of information for locations, also note that for a site with multiple SEs (disk/tape separation) there is no way to "disable" one of the SEs when acquiring work although we do want to avoid using the tape data and just the disk data. DBSLocation is only used when local dbs is set. (Store result case) - however it need to be consistent if we change to PhEDEx node names for the location.

Once these locations are updated then the local queues can compare with the available resources as obtained from WMBS (i.e. CMS site name) and with the site whitelist/blacklist which also contain CMS site names.

File locations

The WorkQueue deals in blocks, but WMBS deals in files. When a WorkQueue element is injected into WMBS, the files in it are the elements that are injected in the file tables. During a WorkQueue element injection the following steps occur:

  1. DBS3Reasder is queried for the SEs where the input block is located (the same as above - connect PhEDEx first).
    • There are 3 different case of file location (all gets se name)
    • acdc input files - gets se name from fwjr which is stored in acdc db when job failed
    • input files without parents - gets the location using standard way (DBS3Reader)
    • input files wit parents - child file gets location the same as above, parents files gets the location from dbs origin_site_name
  2. When injecting into WMBS, the SEs are associated with location entries in WMBS which were populated by SiteDB at the beginning of the agent lifetime - setting the threshold combined with SSB.
  3. The files are also injected in DBSBuffer for parentage but in the DBSBuffer tables the locations are associated directly with SEs instead of CMS sites.
  4. So se name converted to CMS Site name using the table created initially when job gets submitted. (for resource matching)

_Developer Note: This is not good, it is overriding the locations that come from the WorkQueue and doing the query work again, unnecessarily - it could be useful when location changes between the block location mapping and file creation. Better solution is mapping the file location directly from block location instead of storing it separately but that would change the schema and many other places About the locations in DBSBuffer they should be sites as well, but that is a bigger change. (This eventually feed as origin_site_name)

Once the files are associated with locations in WMBS it is possible for the JobCreator to create jobs grouping by SE and then the JobSubmitter can select the sites associated in WMBS to those SEs and submit to them. However before the submission se name changed to CMS site name since that is what factory requires. If we change the site name to node name, factiory config need to be changed as well. Krista pointed that factory config is not just for cms sites so need to consult with site admins.

What happens with files produced in a WMAgent?

The following steps describe how a file produced by a WMAgent job is processed in terms of location information.

  1. In the Worker Nodes the jobs have access to a JobConfig.xml which indicates things such as the site name and the stage-out protocols. The stage-out definition indicates the SE where the files will be stored and this is the information that is associated to the file when it returns to the JobAccountant in the WMAgent.
  2. The JobAccountant associates the files to a location entry based on the SEs and the location-SE mapping in WMBS (populated by SiteDB).
  3. The JobAccountant also uses PhEDEx to store the PhEDEx location in the job report for monitoring.
_Developer Note: This is wrong because it uses SE to Node association based in the information in PhEDEx where the SEs are not guaranteed to be consistent, it should use SE -> SiteName (based on WMBS) -> PhEDEx Node (using SiteDB)._
  1. For merged files to be registered in DBS the SEs are stored directly in DBSBuffer, not using site names.
  2. Finally, the merged files are also injected to PhEDEx, the location is resolved using the information in DBSBuffer (i.e. SE) which is translated to a PhEDEx node (using the PhEDEx API).
  3. In addition, PhEDEx subscription can be made from defined by spec (custodial and non custodial) which uses PhEDEx node name

Developer Note: This is not what we want, we could store the sites in DBSBuffer and do a Site to PhEDEx node mapping with SiteDB instead of using the unreliable SE information in PhEDEx.