WMBS core - dmwm/WMCore GitHub Wiki

The queries for the following tables are mostly defined in the WMBS package. Following is an explanation of each table and the objects they store, additionally you can see the creation statement of all tables in MySQL and Oracle in the repository. The tables are defined in an order where a table appears only after all the tables it depends on are already defined, at the end of the section there is a grouping by objects which may be more natural for the reader.

  • wmbs_fileset: This table contains the information of a fileset, a fileset is a named set of files which can be the input or output of a particular task. A fileset can be open or closed, a closed fileset is guaranteed to be complete and not to have new files associated with it in the future. The concrete conditions for a file closing are defined in this query, they are:

    • The workflow that produced the fileset is marked as injected
    • All the jobs whose output is in the fileset are in cleanout state
    • All the files in the input fileset for the task that produced this fileset are in the list of complete or failed files, and the input fileset is closed.
  • wmbs_file_details: This table contains the basic information about a file, this includes:

    • LFN: Logical File Name
    • Size: Size in bytes
    • Events: Number of events
    • First event: Number of the first event in the file
    • Merged: An indicator if this is a merged (i.e. to be stored permanently) file or not.
  • wmbs_fileset_files: This table holds the associations between files and filesets, the association is many to many, i.e. a fileset can have many files and a file can be in many filesets. It also holds a timestamp of when the associaton was made.

  • wmbs_file_parent: This table contains another piece of information about files, it stores the parentage relationships between them. A file is said to be the parent of another file in WMBS if the parent file was one of the inputs for the job that produced the child file. The parentage relation is transitive.

  • wmbs_file_runlumi_map: This table contains the list of run and lumis present in the files of wmbs_file_details. Each row contains a file id, run number and lumisection.

  • wmbs_checksum_type: This table contains auxiliary information about the possible checksum types used in other WMBS tables, currently it stores 3 types: checksum, adler32 and md5.

  • wmbs_file_checksums: This table contain the checksums for the files in wmbs_file_details, it can store checksums of different types (defined in the previous table) for each file.

  • wmbs_location_state: This table contains the possible states for a site in WMBS, these are:

    • Normal: A normal functioning site, it can be considered as a valid location for job submission.
    • Draining: A draining site, it can be considered for job submission only for jobs that can't run anywhere else. WorkQueue elements will not be acquired for this site.
    • Down: A site with errors or in downtime, no new jobs will be submitted for this site in the current state. The jobs waiting to be submitted that can only run at this site will wait until the state changes. WorkQueue elements will not be acquired for this site.
    • Aborted: A broken site, no new jobs will be submitted for this site in Aborted state, additionally any job pending in the batch system or jobs waiting for submission that can only run at the site will be killed and failed without retries. WorkQueue elements will not be acquired for this site.
  • wmbs_location: This table holds the basic information about sites in WMBS, it contains the following fields:

    • site_name: Identifier for the site in WMBS
    • cms_name: CMS name for the site
    • ce_name: Name of the computing element for the site
    • running_slots: Number of jobs that can be concurrently running at the site.
    • pending_slots: Desired number of jobs to keep in pending state for the site.
    • plugin: Batch system for submission to this site.
    • state: State of the site as defined in the previous table.

Developers Note: This table has problems, the site_name, cms_name and ce_name are always the same. The ce_name is not used at all and several places in the WMAgent would have a problem if the cms_name differs from the site_name, this should be re-evaluated and organized.

  • wmbs_location_senames: This table holds extra information about each site in the database, each site can have many SEs which are registered in this table.

  • wmbs_file_location: This tables contains the information about the location of the files, each row contains associations between files and sites. A site can have many files and a file can be at many sites.

  • wmbs_users: This table holds the basic information about users in WMBS, it contains the following fields:

    • cert_dn: Registered DN for the user.
    • name_hn: Username in Hypernews for the user.
    • owner: Same as name_hn.
    • grp: Group which the user is member of.
    • group_name: VOMS group for the user.
    • role_name: VOMS role for the user.

Developers Note: This table is a mess, it has caused problems in the past and is outdated at the moment. First the VOMS fields have never been used, there is no practical use for this information in production workflows since all real user-group handling should be only in the ReqMgr. Also name_hn and owner are the same but only owner is used. Removing it should be a good idea but requires changes all across the WMAgent.

  • wmbs_workflow: This table contains the basic information about workflows, workflows are defined as collection of tasks and describe all the steps in a request, this table holds a row for each task in a request. The fields stored in this table are:
    • name: Name of the workflow, which is the same as the corresponding request.
    • spec: Path to the spec file which contains the workload for the request.
    • task: Task represented in this entry.
    • type: Type reported to dashboard for this workflow.
    • owner: User that created this workflow, it points to a record in the users table.
    • injected: Indicates if the workflow is fully injected, a workflow is fully injected when all the WorkQueue elements for the request has been injected into WMBS (in any of the agents).
    • alt_fs_close: Internal indicator that replaces injected for fileset closing purposes, only used in the WMAgent Tier-0, see their specific documentation for an explanation of this.
    • priority: Priority of the workflow, it is equal to the priority of the request in ReqMgr.

See the ReqMgr section for the definition of request, workflow, task, workload, etc...

  • wmbs_workflow_output: This table associates tasks (i.e. rows from wmbs_workflow) with their output filesets, each task can have a merged and unmerged output fileset.

  • wmbs_sub_types: This auxiliary table contains the different subscription types in WMBS, these are:

    • Processing
    • Production
    • Merge
    • Cleanup
    • LogCollect
    • Harvesting
    • Skim

    Each subscription type has a numerical priority value which serves as a modifier of the job priority, such that jobs of certain subscription type will be in different levels compared to other types regardless of request priority. E.g. If Merge has priority 3 and Processing has priority 2, then all merge jobs will have higher job priority even in low priority requests compared to Processing jobs.

  • wmbs_subscription: This table holds information about subscriptions, a subscription is basically a pairing of a workflow with an input fileset, it defines which work should be performed on which input. Additionally, a subscription entry indicates the type of subscription, the splitting algorithm for the jobs and whether it is finished or not. A subscription is considered finished when the following conditions are true:

    • The input fileset is closed and the workflow is injected.
    • There are no files in the input fileset in acquired or available state for the subscription. This is the same as saying that all files are completed or failed for the subscription.
    • All jobs related to the subscription are in cleanout state.

    Usually there are more than one subscription for each workflow and fileset, e.g. a DataProcessing task has a subscription for every input block (which is also a fileset).

  • wmbs_subscription_valid: This auxiliary table is used to keep track of the potential locations for a subscription, just for the top level task subscriptions. It holds an association of subscriptions with sites and whether that association is valid (i.e. the subscription can run at the site) or not.

  • wmbs_sub_files_available: A file is considered available in a subscription when there is no job that belongs to the given subscription which has it as input, available files are examined by the job splitters to create new jobs. This table contains the association between available files and their subscriptions.

  • wmbs_sub_files_acquired: A file is considered acquired in a subscription when there is at least on job in the subscription that is using it as input, no more jobs will be created in the subscription for an acquired file. This table contains the association between acquired files and their subscriptions.

  • wmbs_sub_files_failed: A file is considered failed when the job(s) that used it as input in a given subscription completed without successful outcome. This table contains the association between failed files and their subscriptions.

  • wmbs_sub_files_complete: A file is considered complete when the job(s) that used it as input in a given subscription completed with successful outcome. This table contains the association between complete files and their subscriptions.

  • wmbs_jobgroup: A jobgroup is a set of jobs which share some common characteristics, all jobs in a jobgroup belong to the same subscription and their output goes to the same fileset. This table contains the information about jobgroups.

  • wmbs_job_state: This auxiliary table contains all the possible job states in the WMAgent, these are:

    • new: This state is not reachable but it exists only to be a start state when recording the first transitions to real states.
    • created: A job is created after the JobCreator finishes creating it, created jobs are ready to be submitted.
    • executing: A job is executing when it is submitted to the batch system, note that this doesn't mean that the job is actually running, just that it was submitted by the WMAgent.
    • complete: A complete job is one that is no longer present in the batch system after being submitted.
    • success: A complete job transitions to success state when the WMAgent determines that it ran successfully after inspecting the pickled job report.
    • createfailed: A createfailed job is a job which was deemed unrunnable by the WMAgent, nevertheless it was created for monitoring and accounting purposes. Currently the only cause of a createfailed job is a lumi with too many events as defined in the EventAwareLumiBased job splitter.
    • submitfailed: A job that fails to be submitted to the batch system is marked as submitfailed. There are multiple reasons for this failure, but usually refers to problems in the site whitelist or odd glitches in the batch system.
    • jobfailed: A job is marked as failed when the failure occurs after the job was in executing state, this failure could have happened while the job was pending in the batch system or during actual execution in the GRID. The failure can also be determined after a complete job's report is analyzed by the WMAgent.
    • createcooloff: This state is defined in the code, but it is currently unreachable.
    • submitcooloff: A job in submitcooloff is a job that was in submitfailed state and is being processed for retries by the WMAgent.
    • jobcooloff: A job in jobcooloff is a job that was in jobfailed state and is being processed for retries by the WMAgent.
    • retrydone: A job is marked as retrydone when the WMAgent determines that it has been retried enough times without success and it should be failed permanently.
    • exhausted: A job is in exhausted state immediately after retrydone state, it is just an internal transition state before cleanout.
    • killed: A job is marked as killed when a workflow is aborted and the WorkQueue massively fails all the jobs in the system for it. Killed is equivalent to the retrydone state.
    • createpaused: This state is defined in the code, but it is currently unreachable.
    • submitpaused: A job is marked as submitpaused after retrying certain number of times and having submit failures, this only when the PauseAlgo retry algorithm is used.
    • jobpaused: A job is marked as jobpaused after retrying certain number of times and having job failures, this only when the PauseAlgo retry algorithm is used.
    • cleanout: This is the final state of all jobs, it means that the job has been fully processed by the WMAgent and its information is archived.
  • wmbs_job_assoc: A table that holds the association between jobs and files, for each job it records the files it processes as input.

  • wmbs_job_mask: A table that holds the processing mask for the jobs that have one, the processing mask indicates the event, lumi and run ranges that a job should process from the input files described in the corresponding wmbs_job_assoc entries.