BossAir ResourceControl - dmwm/WMCore GitHub Wiki

BossAir

The BossAir package in WMCore implements an interface for job handling in different batch systems, it allows submission, monitoring and removal of jobs. It has two tables to keep track of the submitted jobs. The queries for these tables are defined in the BossAir package, including the creation statements for MySQL and Oracle. The tables are:

  • bl_status: This table holds the possible states of the jobs in the configured batch systems, for example if HTCondor is used (i.e. CondorPlugin) then the available states are:

    • Complete
    • Unknown
    • Running
    • Held
    • Idle
    • Timeout
    • Error
    • New
    • Removed
  • bl_runjob: Run jobs are BossAir's representation of the WMBS jobs, this table holds the information on submitted jobs, both past and present. The information stored for a runjob is:

    • id : Incremental id in the table, not representative.
    • wmbs_id: Id of the corresponding job in wmbs_job.
    • grid_id: Meant to store the id of the job in the batch system, only used in LSF.
    • bulk_id: Id of the job in a set of jobs, currently unused.
    • status: Indicates if the job is currently submitted or not, '1' means active and '0' means complete.
    • sched_status: Status of the job in the batch system, this is linked with the bl_status table.
    • retry_count: Retry number for this instance of the job, note that the same WMBS id can appear on the bl_runjob table but each one with a different retry_count representing different instances of the job.
    • status_time: Time of the last status change in the batch system.
    • location: Site where the job was submitted to.
    • user_id: Id of the owner of the workflow that the job belongs to, according to the wmbs_users table.

ResourceControl

The resource control package implements an interface for manipulating resource thresholds used by the WMAgent for job submission and resource distribution among sites. It implements a table in WMBS with the following information:

  • rc_threshold: The resource control table holds a row for every site in wmbs_location and subscription type pair in wmbs_sub_types, for each site and subscription type there is a pending_slots column which indicates the target number of pending jobs to keep at a site for a subscription/job type and a max_slots column which indicates the absolute maximum number of running jobs to keep at a site for a subscription/job type.