Data Locking in WMCore - dmwm/WMCore GitHub Wiki

Data Locking in WMCore

Datasets and files that are in use by active workflows need to be excluded from the automatic data cleanup mechanisms in Dynamic Data Management (DDM/Dynamo). This "data locking" feature was previously provided by Unified. Now the functionality is provided by WMCore.

  • Data locking is done at the dataset level. Locking at the block level is not implemented.
  • Data locking is global. Site specific locks are not implemented.

What datasets need locked

  1. Datasets associated with workflows in the following states need to be locked.
['assignment-approved', 'assigned', 'staging', 'staged', 'failed',
'acquired', 'running-open', 'running-closed', 'force-complete', 'completed', 'closed-out']
  1. Any parent datasets of datasets used by workflows with the property: "IncludeParents": True

  2. Transient output and unmerged LFNs Generated by retrieving the workflow property: OutputModulesLFNBases

  3. "Ad hoc" locks A set of manually configurable things to lock. Provided by "adhoc_lock.json" in Unified. Should now be configued manually by DDM.

Current status

A combination of the WMStats API globallocks and the ReqMgr2 API parentlocks is used to determine the set of global datasets that are in use and should not be removed. THe WMStats protectedlfns API provides a list of transient output datasets and "unmerged" base directories. API details are at https://github.com/dmwm/WMCore/wiki/wmstatsserver-api and https://github.com/dmwm/WMCore/wiki/ReqMgr2-apis.

Dependencies

DBS - must be queried to discover parent datasets

Long term

These APIs will likely go away after the transition to Rucio. New requests could include a property for any required parent datasets. This would eliminate DBS dependency and potential issues arising from failed DBS queries. Any request already in the system would not have this property, some could still be active for ~6 months.