ReqMgr2 MicroService Pileup - dmwm/WMCore GitHub Wiki

This wiki provides a description of the MSPileup service, as well as the final pileup data structure adopted and a plain english description of the logic to be implemented.

Architecture

The architecture of MSPileup service follows MPV (Model, View, Controller) pattern with MongoDB as back-end database for storage. The service is based on WMCore REST framework and located in WMCore/MicroService/MSPileup area. It consists of two distinct services:

MSPileup HTTP service which provides access to MSPileup via HTTP methods
MSPileup Tasks daemon which handles internally data placement logic via polling cycles.

The codebase has the following modules: -MSPileupObj.py provides details of MSPileup data object

MSPileupData.py provides data layer business logic, i.e. it creates, deletes, modifies MS pileup objects in MongoDB database
MSPileup.py API layer which defines all MSPileup APIs
MSPileupTaskManager.py API layer which defines all MSPileupTaskManager APIs
MSPileupTasks.py defines all MSPileup tasks like:
- monitoring task
- pileup size task
- inactive workflow task
- active workflow task
- clean-up task
- cms monit task

Within each task we process all workflows or any other tasks concurrently via asyncio module.

Service/Data.py module defines all HTTP APIs (MSPileup and others used by MS micro-services)
RestApiHub.py defines MSPileup HTTP end-point within WMCore REST server.

Below, you can find detailed description of individual layers.

REST APIs

MSPileup provides rich set of RESTful APIs which will allow clients to fetch, upload, modify and delete the pileup documents.

HTTP GET API calls

A few GET examples on how to fetch pileup object(s) from MSPileup are (if testing localhost, replace cmsweb by localhost:8249):

curl -v "https://cmsweb.cern.ch/ms-pileup/data/pileup?pileupName=pileup"
curl -v "https://cmsweb.cern.ch/ms-pileup/data/pileup?campaign=Campaign"
# and we can use filters to extract certain fields from the doc, e.g.
curl -v "https://cmsweb.cern.ch/ms-pileup/data/pileup?pileupName=pileup&filters=["campaigns", "pileupType"]"

HTTP POST API calls

The POST API call can be used either to create a new pileup object in MSPileup, or to fetch documents with a more customized filter. Examples:

# first create document
curl -v -X POST -H "Content-type: application/json" -d '{"bla":1}' \
       https://cmsweb.cern.ch/ms-pileup/data/pileup
# second, query the data via provided JSON query (spec)
curl -v -X POST -H "Content-type: application/json" \
       -d '{"query":{"pileupName":"bla"}}' \
       https://cmsweb.cern.ch/ms-pileup/data/pileup
# third, use query and filters
curl -v -X POST -H "Content-type: application/json" \
       -d '{"query":{"pileupName": "bla"}, "filters":["campaigns"]}' \
        https://cmsweb.cern.ch/ms-pileup/data/pileup

HTTP PUT API calls

The PUT method is used to update a pileup object in MSPileup. A few examples are:

# update MSPileup document and set its active status for pileupName /abc/xyz/PREMIX workflow
curl -v -X PUT -H "Content-type: application/json" -d '{"pileupName":"/abc/xyz/PREMIX", "active":false}' \
        https://cmsweb.cern.ch/ms-pileup/data/pileup

HTTP DELETE API calls

Finally, the DELETE method can be used to permanently delete a pileup object in MSPileup. Note that this is only allowed at administrator level, otherwise Rucio rules might be left uncleaned.

curl -v -X DELETE -H "Content-type: application/json" -d '{"pileupName":"pileup"}' \
        https://cmsweb.cern.ch/ms-pileup/data/pileup

Data Structure

The MSPileup data layer defines the following MSPileup data-structure:

{"pileupName": string,
  "pileupType": string,
  "insertTime": integer,
  "lastUpdateTime": integer,
  "expectedRSEs": list of strings,
  "currentRSEs": list of strings,
  "fullReplicas": integer,
  "campaigns": list of strings,
  "containerFraction": float,
  "replicationGrouping": string,
  "activatedOn": integer,
  "deactivatedOn": integer,
  "pileupSize": integer,
  "ruleIds": list of strings,
  "customName": string,
  "transition": list of dicts
  "active": boolean
}

where the mandatory parameters definition is:

pileupName: string with the pileup dataset name.
pileupType: either "premix" or "classic" string value, according to the pileup type.
expectedRSEs: a non-empty list of strings with the RSE names, e.g. ["T2_US_Nebraska", "T2_CH_CERN"]. This is validated against the known Disk RSEs in Rucio.
active: boolean flag enabling or not the pileup data placement. If False, then the service ensures that there are no wmcore_transferor rules for this pileup.

the optional parameters definition is:

campaigns: a list of strings with the campaign names using this pileup. Default: [].
containerFraction: a real number that corresponds to the fraction - over the number of blocks - to be replicated to a given RSE. To be used whenever the micro-service is mature enough and automatic data placement decisions can be performed. Default: 1.0.
replicationGrouping: a string matching the grouping granularity field provided by Rucio. Allowed values would be DATASET (data placement by block) or ALL (whole container is placed under the same RSE). Default: ALL.
fullReplicas: an integer saying how many replicas of the pileup dataset are supposed to be created. It will eventually supersede "expectedRSEs". To be used whenever the micro-service is mature enough and automatic data placement decisions can be performed. Default: 1.
customName name of custom DID
transition list of transition records like [{'updateTime': 123, 'containerFraction': 0.5, 'customDID': 'blah', 'DN': 'blah2'}, ...]

and the service-based parameters definition is (parameters set by the service itself, regardless if provided by the user or not):

insertTime: integer with the timestamp (seconds since epoch in GMT) of when this pileup object was first defined in the microservice.
lastUpdateTime: integer with the timestamp (seconds since epoch in GMT) of the last modification made to this pileup object.
currentRSEs: a list of strings with the RSE names, e.g. ["T2_US_Nebraska", "T2_CH_CERN"], empty string must be supported! This is validated against the known Disk RSEs in Rucio. Rules that have been satisfied will have their relevant RSE name listed here.
activatedOn: integer with the timestamp (seconds since epoch in GMT) of when this pileup object last became active. If active=True, then Default: integer timestamp, otherwise Default: None.
deactivatedOn: integer with the timestamp (seconds since epoch in GMT) of when this pileup object last became inactive. If active=False, then Default: integer timestamp, otherwise Default: None.
pileupSize: integer with the size of the full pileup dataset (in bytes), to be filled asynchronously. Default: 0
ruleIds: list of strings with the rule IDs that are locking this pileup.

Each pileup object will be persisted in the backend database (MongoDB) in their own document

Data Placement logic

The data placement logic is defined in MSPileupTasks.py module.

High level tasks and assumptions

This section describes some architecture design choices, assumptions for the service and important information that needs to be recorded in the logs.

Some assumptions to be considered for this microservice are:

rules will be created against a single RSE (for a single DID)
service is supposed to be a singleton (depending on the cost-benefit, supporting multiple instances would be a bonus)
polling cycle will very likely not be smaller than every hour
no need to adopt database transactions (likely worst case would be to perform the same action from multiple instances, overwriting documents in MongoDB)

There are 3 main tasks to be executed by this service, all part of the same MSPileup microservice and running in the same service instance. These tasks should be executed sequentially, but their internal logic can apply concurrent processing:

Monitoring task (listing status of rule ids and persisting it in the database)
Inactive pileup task (clean up of rule ids that belong to inactive pileup objects)
Active pileup task (rule creation or deletion for active pileup objects

Log records should be created whenever appropriate, including a short summary by the end of each polling cycle. Nonetheless, here is a non-exhaustive list of critical logs to have:

start and end of each of the major activities (monitoring, active, inactive); and time spent.
rule creation (containing the DID, RSE and rule id)
rule deletion (containing the DID and rule id)
rule completion (either fully satisfied or partial)
- if rule is not completed (! OK), then it would be useful to print its state as well

Monitoring task

This task is supposed to iterate over all the MongoDB documents, fetch the current state of each rule ID and persist this information back in MongoDB. A short algorithm for it can be described as follows:

Read pileup document from MongoDB with filter active=true
For each rule id in ruleIds:

query Rucio for that rule id and fetch its state (e.g.: afd122143kjmdskj)
if state=OK, log that the rule has been satisfied and add that RSE to the currentRSEs (unique)
otherwise, calculate the rule completion based on the 3 locks_* field and remove that RSE from the currentRSEs (if existent)

now that all the known rules have been inspected, persist the up-to-date pileup doc in MongoDB

Inactive pileup task

This task is supposed to look at pileup documents that have been set to inactive. The main goal here is to ensure that there are no Rucio rules left in the system (of course, for the relevant DID and the Rucio account adopted by our microservice). Pileup documents that are updated as a result of this logic should have their data persisted back in MongoDB. A short algorithm for it can be described as follows:

Read pileup document from MongoDB with filter active=false
for each DID and Rucio account, get a list of all the existent rules

make a Rucio call to delete that rule id, then:
- remove the rule id from ruleIds (if any) and remove the RSE name from currentRSEs (if any)

make a log record if the DID + Rucio account tuple does not have any existent rules

in this case, it would be expected to have an empty currentRSEs list, given that RSEs have been removed in the step above.

once all the relevant rules have been removed, persist an up-to-date version of the pileup data structure in MongoDB

Active pileup task

This task is supposed to look at pileup documents active in the system. Its main goal is to ensure that the pileup DID has all the requested rules (and nothing beyond them), according to the pileup object configuration. Pileup documents that are updated as a result of this logic should have their data persisted back in MongoDB. A short algorithm for this active task is described below:

Read pileup document from MongoDB with filter active=true
if expectedRSEs is different than currentRSEs, then further data placement is required (it's possible that data removal is required!)
make a local copy of the currentRSEs value to track rules incomplete but ongoing
for each rule matching the DID + Rucio account, perform:

if rule RSE is not in expectedRSEs, then this rule needs to be deleted.
- upon successful rule deletion, also remove the RSE name from currentRSEs and ruleIds (if any)
- make a log record
else, save this rule RSE in the local copy of currentRSEs. This rule is likely still being processed by Rucio.

now that we evaluated expected versus current, for each expectedRSEs not in our local copy of currentRSEs:

first, check whether the RSE has enough available space available for that (we can assume that any storage with less than 1 TB available cannot be considered for pileup data placement)
- in case of no space available, make a log record
in case there is enough space, make a Rucio rule for that DID + Rucio account + RSE
- now append the rule id to the ruleIds list

once all the relevant rules have been created - or if there was any changes to the pileup object - persist an up-to-date version of the pileup data structure in MongoDB

Clean-up pileup task

This task looks up documents based on the following condition set:

If document is active=False; and
document has an empty ruleIds list; and
document has an empty currentRSEs; and
document has been deactivated for a while (deactivatedOn=XXX), to be configured in the service

Then, it deletes the documents met these requirements.

CMS MONIT task

The MSPileupTaskManager executes CMS Monit task which fetches new documents from MSPileup backend database, then flatten them out (see below) and send them to CMS MONIT Elastic Search which is used for MSPileup dashboard and aggregation.

Each document is flatten according to simple rule, for all attributes which has list data type except ruleIds, i.e. campaigns attribute which has list of campaigns will be converted to campaign which will have single value. Here is a data structure we use to send docs to ES:

{"pileupName": string,
  "pileupType": string,
  "insertTime": integer,
  "lastUpdateTime": integer,
  "expectedRSE": string,
  "currentRSE": string,
  "fullReplicas": integer,
  "campaign": string,
  "containerFraction": float,
  "replicationGrouping": string,
  "activatedOn": integer,
  "deactivatedOn": integer,
  "pileupSize": integer,
  "customName": string,
  "transition": list of dicts,
  "ruleIds": list of strings,
  "active": boolean
}

Creating pileup documents in a dev cluster

Whenever a new development kubernetes cluster is created, we need to populate the MSPileup database with the pileup configurations that we use in the DMWM and Integration workflow test templates.

To do that, you need to be in a node with the WMCore environment (e.g., an agent node). Next you need to download this file: https://github.com/dmwm/WMCore/blob/master/bin/adhoc-scripts/createPileupObjects.py, which will be used to parse a JSON file and inject pileup configuration documents into MSPileup. A pileup JSON file can be found under: https://github.com/dmwm/WMCore/blob/master/test/data/WMCore/MicroService/DataStructs/pileups_dev.json

With all these pieces in place, you can execute it like:

python3 createPileupObjects.py --url=https://cmsweb-testXXX.cern.ch --fin=pileups_dev.json --inject

Open questions

NOTE: this sub-section can be removed once we know these answers.

Q.1: what is the definition of currentRSEs?

a) RSEs that have a Rucio rule (unsatisfied rucio rule)

b) RSEs that have a complete copy of the data (a satisfied rucio rule) <----- Alan's preference!

Q.2.: what happens if a given rule id is deleted now and we try to delete it in an hour from now?

a) does the rule deletion fails the second time?

b) or does it sets a new expiration time for 2h from now?

Modules already available

NOTE: this sub-section can be removed once we know these answers.

Rucio wrapper client: https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/Services/Rucio/Rucio.py#L645 Rucio pycurl based wrapper: https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/MicroService/Tools/PycurlRucio.py

Miscellaneous

When the active task was initially discussed, we also had the following candidate logics to be considered.

CANDIDATE 1:

Read pileup document from MongoDB with filter active=true
if expectedRSEs is different than currentRSEs, then further data placement is required (it's possible that data removal is required!)
for each expectedRSEs not in currentRSEs:

check if there is an ongoing Rucio rule for that DID + Rucio account + RSE (we might want to remove RSE from this call?)
if there is a Rucio rule, then ensure that it's listed under ruleIds, otherwise add it
if there is none, then a new rule needs to be created. First, check whether the RSE has enough space available for that
- if it does, create the rule and append the rule id to ruleIds
- otherwise, make a log record saying that there is not enough space

once all the relevant rules have been created, persist an up-to-date version of the pileup data structure in MongoDB

CANDIDATE 2:

Read pileup document from MongoDB with filter active=true
for each DID and Rucio account, get a list of all the existent rules

if the rule RSE name is in expectedRSEs, then add the rule id in ruleIds (keeping uniqueness)
elif the rule RSE name is in currentRSEs, then (this rule should no longer exist):
- make a Rucio call to delete this rule, remove the RSE from currentRSEs and remove the rule id from ruleIds (if any)
else - thus RSE not in expectedRSEs nor in currentRSEs - it means that the rule has been created by someone else, delete it!
- make a Rucio call to delete this rule and delete the rule id from ruleIds (if any)

for each RSE in expectedRSEs that does not have an ongoing rule and/or that is not listed under currentRSEs

create a new rule for the DID + Rucio account + RSE and add the rule id to the ruleIds list

once rules have been removed or created, persist an up-to-date version of the pileup data structure in MongoDB

However, the algorithm described in the "Active pileup task" sub-section is the one that have been chosen and implemented.