ReqMgr2 MicroService Output - dmwm/WMCore GitHub Wiki

Introduction

MSOutput is a microservice which is responsible for output data placement in central production of CMS. Currently, MSOutput solely runs with Rucio data management software. It processes requests whose states are closed-out and announced in RegMgr2.

MSOutput performs disk and tape placements in dataset level with CMS terminology, in container level with Rucio terminology. In other words, there is no block or file level distribution of the data across the grid.

Here is an important note about the usage of Unified configuration and campaign configuration in WMCore, which will be useful in reading this document:

  1. P&R is in charge of setting Unified configuration and campaign configurations.
  2. Currently WMCore fetches a copy of the Unified configuration (every hour), and that’s how the Unified is used within the MicroServices. However, in the future, it is planned to have all these configurations available within WMCore.
  3. WMCore does not use Unified campaigns. However, whenever there is a change in the Unified campaigns, Unified also pushes the same changes to the WMCore maintained campaign configuration. So, any decisions based on campaign configuration are taken from the WMCore-based campaigns.

Note that Tape and Disk RSE expression are configured in the MSOutput service configuration, their current values in production are:

data.rucioTapeExpression = "rse_type=TAPE\cms_type=test"
data.rucioDiskExpression = "(tier=2|tier=1)&cms_type=real&rse_type=DISK"

Disk Placements:

For each produced output dataset, MSOutput determines whether it is going to be placed into disk as well as the parameters required for these placements such as the destination, lifetime, number of copies etc.

Determining whether to send a dataset to disk or not:

MSOutput decides if a dataset is going to be sent to disk or not according 3 configurations:

  1. MSOutput Configuration:
    1. If the data tier of the dataset is blacklisted in the MSOutput configuration file, then that dataset is not placed into disk. excludeDataTier parameter specifies the blacklisted data tiers. Currently, MSOutput does not apply any restriction for any data tiers:
  2. Campaign Configuration:
    1. If the data tier is whitelisted in the campaign configuration, then it is placed into disk. toDDM parameter in Unified campaign configuration (TiersToDM in WMCore) specifies the whitelisted data tiers. Currently only GEN tiers of some campaigns are whitelisted.
  3. Unified Configuration:
    1. If the data tier is whitelisted in Unified configuration, then it is placed into disk. tiers_to_DDM parameter in Unified configuration specifies these data tiers
    2. If the data tier is blacklisted in Unified configuration, then it is not placed into disk. tiers_no_DDM parameter in Unified configuration specifies these data tiers:
    3. Note that tiers_to_DDM has a precedence over tiers_no_DDM.

It is important to note that MSOutput does these checks in the aforementioned order and once a data tier falls into a category, later checks become redundant. For instance, if the GEN tier is whitelisted in campaign configuration, then it goes to disk even if it is blacklisted in Unified configuration.

Determining Rule Attributes:

  1. Destination:

    1. RelVal:
      1. Destination is decided based on the output dataset datatier. The dictionary policy is defined in the MSOutput service configuration. For each final destination, a new copy of the output dataset will be made.
    2. Non-RelVal:
      1. MSOutput gives an RSE expression to Rucio which specifies that this dataset can be placed to any T1 and T2 disk and Rucio handles the rest. This is the RSE expression: (tier=2|tier=1)&cms_type=real&rse_type=DISK
  2. Lifetime:

    1. RelVal:
      1. Determined by rulesLifetimeRelVal parameter of MSOutput configuration.
      2. It is configured as 12 months currently.
    2. Non-RelVal:
      1. Determined by rulesLifetime parameter of MSOutput configuration.
      2. It is configured as 1 month currently.
  3. Number of Copies:

    1. Resubmission workflows:
      1. The number of copies is set as 0. In other words, MSOutput does not make an output data placement for ACDC workflows to avoid duplicate placements. Original workflow handles it.
    2. Original workflows:
      1. The number of copies is set to the value determined by the maxcopies parameter of the campaign configuration (MaxCopies in WMCore). If there is no such parameter defined, then it is set to 1. Note that currently, all campaigns except one are configured with 1 maxcopies. The exception: https://github.com/CMSCompOps/WmAgentScripts/blob/master/campaigns.json#L928
  4. Weight:

    1. This attribute specifies the disk quota for CMS. It is set as ddm_quota
    2. With the attribute, Rucio makes sure that we do not overload a given RSE and properly use the space.
  5. Grouping:

    1. It is set to ALL
  6. Activity:

    1. It is set to Production Output
  7. Account:

    1. This is the Rucio account name which will be used while creating rules in Rucio.
    2. It is set to wmcore_output currently.
    3. It is configurable by the rucioAccount parameter in MSOutput configuration file.
  8. Comment:

    1. It is set to WMCore MSOutput output data placement

Tape Placements:

Determining whether to send a dataset to tape or not:

  1. RelVal Check:
    1. enableRelValCustodial parameter of the MSOutput configuration determines whether to make tape placements for RelVal outputs or not. Note that, currently, this parameter is set to False, i.e. RelVal outputs do not go to tape.
  2. Resubmission Check:
    1. MSOutput does not make tape placements for “Resubmission” workflows. Original workflows handle it.
  3. MSOutput Configuration:
    1. If the data tier of the dataset is blacklisted in the MSOutput configuration file, then that dataset is not placed to disk. excludeDataTier parameter specifies the blacklisted data tiers. Currently, MSOutput does not apply any restriction for any data tiers:
  4. Unified Configuration:
    1. If the data tier of the dataset is blacklisted for tape in Unified configuration, then no tape placement is done for this dataset. tiers_with_no_custodial parameter specifies this decision

Determining Rule Attributes:

  1. Number of Copies:

    1. Number of copies is always 1.
  2. Destination:

    1. Firstly, note that all allowed outputs of a workflow goes to the same destination. MSOutput sums up all output dataset sizes of a workflow and the total amount is used while choosing the tape destination.
    2. MSOutput gives the following RSE Expression to Rucio and fetches a list of RSEs: 2. rse_type=TAPE\cms_type=test\\rse=T0_CH_CERN_Tape
    3. Then, MSOutput fetches _ddm_quota _for each RSE and eliminates the RSEs whose quota is less than the size of the output dataset.
    4. Then, MSOutput makes a weighted random selection from the list of RSEs whose quota is sufficient, where the weight is defined as the quota of the RSE. In other words, it is more likely to choose a tape as a destination if its available space is more than that of others.
  3. Lifetime:

    1. Note that, MSOutput does not specify a lifetime parameter for tape placements, which is different from disk placements. So, tape placements are done with the intention that the data will be there forever unless someone wants to delete it on purpose.
  4. Ask for approval

    1. Each RSE has an attribute which specifies whether it is required to get an approval for the placement or not.
    2. Note that, if the tape placements are not approved, then CMS might lose data. So, tape placements are done with the assumption that every tape placement will be approved.
    3. This is a necessity, since some sites need to create the tape libraries before they can receive data.
  5. Grouping:

  6. Activity:

    1. It is set to Production Output
  7. Account:

    1. This is the Rucio account name which will be used while creating rules in Rucio.
    2. It is set to wmcore_output currently.
    3. It is configurable by the rucioAccount parameter in MSOutput configuration file.
  8. Comment:

    1. It is set to WMCore MSOutput output data placement

RelVal Disk Policy:

Starting in March 2022, with this PR: https://github.com/dmwm/WMCore/pull/11024, RelVal Disk output data placement has been redesigned such that the output data policy can be configured - by datatier - in the MSOutput service configuration, as a python object (list/dict). The current policy defined in production is:

data.relvalPolicy = [{"datatier": "GEN-SIM", "destinations": ["T2_CH_CERN"]},
                     {"datatier": "ALCARECO", "destinations": ["T2_CH_CERN"]},
                     {"datatier": "default", "destinations": ["T2_CH_CERN"]}]

where all RelVal output datasets are placed under T2_CH_CERN, with a single copy. If the dataset has a datatier that is not defined in the policy, then destination would be set according to the default value, thus T2_CH_CERN.

If a given datatier is defined to have more than one destination, then the Rucio rule would be modified to have more than 1 copies as well, it would actually have 1 copy for each destination.

Note that this policy is validated during the startup of MSOutput, including the validation of the datatier and destination names. In case the policy is updated, we need to push the new configuration to the CMSWEB MSOutput production system and restart the service.

Weak points and possible improvements:

  1. Data-tier selection for disk placement:
    1. As discussed above, whether a data-tier is going to be placed to disk or not is determined by Unified configuration (tiers_to_DDMand tiers_no_DDM) and it seems like this configuration does not have a strong justification behind and it should be re-visited.
  2. Lifetimes of relVal and non-RelVal outputs:
    1. Lifetimes are set as 12 months and 1 month respectively and this does not have a strong justification. This information should be discussed with PPD.
  3. Data-tier selection for tape placement:
    1. As discussed above, whether a data-tier is banned for tape placements or not is determined by Unified configuration (tiers_with_no_custodial) and it seems like this configuration does not have a strong justification behind and it should be re-visited.
  4. Asking for approval for tape selections:
    1. If a tape placement is not approved, then CMS might lose that data. I guess, all tape requests are approved, but it would be good to think about the case where they are not for some reason.
  5. Considering remaining space:
    1. Disk and tape selections are performed according to the weight=ddm_quota parameter which specifies the CMS quota for each RSE. If I am not wrong, this parameter is static and it does not take the remaining space information into account. This might be an obstacle for distributing the outputs in a balanced manner and it should be re-visited