ReqMgr2 MicroService RuleCleaner - dmwm/WMCore GitHub Wiki

This documentation is meant to describe the architecture, behavior and APIs for the ReqMgr2 MicroService RuleCleaner module. This MicroService is meant to remove Rucio rules that are no longer needed in the Workload Management system, such as:

  • Block-level rules created by WMAgent, against the origin RSE where data is getting produced.
  • Container and block-level rules created by MSTransferor, for input data that is no longer in the system. In addition to that, this MicroService is now also responsible for the workflow archival, which is the final status that workflows remain.

Architecture proposal based on request status in ReqMgr2

The currently discussed strategy was to have the General NonCustodial (disk) rules created by WMagent to be deleted as soon as the Custodial (Tape) rule is created. The Rucio internal mechanism to protect from data loss can be relied on and to make the request for rule deletion as soon as the Tape rule is created. We decided to use the information recorded in the MSOutput MicroServise database in order to find the wrokflows and datasets suitable for cleaning and in such way to minimize the calls to Rucio. We also need to clean after the MSTransferror Microservice. For the later the straegy is To Be Discussed.

Here is how we envision it to work:

  • Start a separate MS thread and let it be managed by MSManager as for the rest of the Micro Services
  1. Cleaning after WMagents:
    • Start querying Reqmgr for all the workflows in the given statuses ['announced', 'aborted-completed', 'rejected'] and create 2 different paths for those:
      • The ones that are about to be archived from status announced, they have already been processed by MSOutput and we have records about the container level rules in the MSOutput database. We are going to use the MSOutput Rest Info interface, and parse the output.
      • The ones that are about to be archived from status aborted or rejected, we must query Rucio with the proper regex, so we can fetch the list of exiting block level rules related to a given DID.
    • Once we have a properly created list of block level rules delete those from Rucio
    • Move the workflow status to the proper archived-* state
  2. Cleaning after MSTransferror: