JobCreator - dmwm/WMCore GitHub Wiki

The JobCreator function is to create jobs (shocking!), the summary of the process is:

  • Iterate over all the subscriptions with available files in the database, i.e. subscription with files in the wmbs_sub_files_available table.
  • For each subscription found, execute the appropriate job splitter as defined in the split_algo column in wmbs_subscription, the job splitter will create job objects in memory.
  • These job objects belong to a jobgroup, where each jobgroup has a different set of locations.
  • They are finally stored in WMBS and disk (pickled objects). (as "new" state even if job creation fails - failed job is marked in the job parameter failedOnCreation = True)
  • Input files are associated to jobs in WMBS, which will later be used for file parentage.
  • JobGroups are created with the same workflow, task, subscription and locations.
  • Then jobs in the job group move to created state or createfailed state. When createfailed state transaction happens failed job report is created.

JobSplitters

The heart and brain of the JobCreator are the splitting modules, these define how jobs are created based on the splitting parameters from a spec and the files available for a subscription. Some of the modules defined in the code are not used at all, all the ones present here are actually used. Note that all the splitters separate jobs by location, these jobs are aggregated in jobgroups.

JobSplitters used to create jobs as soon as files were made available in wmbs (for further job creation). It has been changed in the end of 2017 such that the production/processing splitting modules check whether:

  • there is enough work (either number of events or lumis) that meet the task splitting parameters
  • or, if the fileset has been closed already, which means there won't be any further files available.

EventBased

The EventBased splitter takes the following arguments:

  • events_per_job: Number of events per job
  • events_per_lumi: Number of event in a lumisection
  • include_parents: Whether to configure two file processing with parent files in the jobs.

The EventBased splitter will examine all the input files available and create jobs with the exact number of events provided in events_per_job, except in the last job of a file which may contain less events. The event splitter always stops at file boundaries which means that no job will contain more than one input file. Additionally, for production jobs (i.e. no input files) the jobs will be configured with the given events per lumi.

LumiBased

The LumiBased splitter takes the following arguments:

  • lumis_per_job: Number of lumis per job
  • halt_job_on_file_boundaries: Indicates if a job can process more than one file.
  • splitOnRun: Indicates if a job can process more than one run.

The LumiBased splitter creates jobs with the exact number of lumis configured in lumis_per_job, however jobs may contain less lumis if the splitter is configured to stop at file or run boundaries.

The LumiBased fully supports ACDC and uses a lumi white/black list in these cases provided by the ACDC documents.

EventAwareLumiBased

The EventAwareLumiBased is a lumi based splitter configured in terms of number of events, it takes the following arguments:

  • events_per_job: Number of desired events per job.
  • max_events_per_lumi (being deprecated as of beginning of 2018): Maximum processable number of events in a single lumi file.
  • job_time_limit: maximum amount of processing time we allow a job to have (default to 48h). If events in a job * TimePerEvent is bigger than this, then this job is marked as createfailed and the file/run/lumi is uploaded to the ACDCServer.
  • halt_job_on_file_boundaries: Indicates if a job can process more than one file.
  • splitOnRun: Indicates if a job can process more than one run.

The EventAwareLumiBased will create jobs with at least one lumi and it will try to adjust the number of lumis in a job dynamically for each file depending on the configured target of events per job, since the exact number of events in a lumi is not known the splitter uses an average events per lumi number that changes for each file. After the average number of events per lumi in a file is calculated, then the splitter is the same as a LumiBased splitter with a number of lumis per job equal to the target number of events divided by the number of events per lumi. When the splitter is configured to put more than one file in a job, the process gets a little bit more complex since the number of lumis per job changes when moving over file boundaries however it is designed to keep the expected number of events in the job below the given targe.

The max_events_per_lumi parameter allows the splitter to discard files with a single lumi with too many events, for these files a single job is created and marked as createfailed in the JobCreator.

FileBased

The FileBased splitter is a simple splitter which creates jobs which process the number of files configured in files_per_job.

ParentlessMergeBySize

The ParentlessMergeBySize splitter is one of the so called merge splitters which are designed to be used on unmerged files in order to aggregate them in bigger files aimed for tape storage. The ParentlessMergeBySize splitter takes the following arguments:

  • max_merge_size: Maximum size for a merged file.
  • min_merge_size: Minimum size that will trigger a merge job.
  • max_merge_events: Maximum number of events in a merged file.
  • merge_across_runs: Indicates if a merged file should contain more than one run.
  • max_wait_time: Maximum time to wait before triggering a merge jobs.

The main difference in this splitter compared to the previous ones is that jobs are not created as soon as files are available, for example in the event splitter a single file would trigger jobs as soon as it was made available. The merge splitter waits for certain conditions to be met before triggering merge jobs, this in order to keep the output files of adequate size. The conditions to trigger a merge job are:

  • The available files are enough to meet the min_merge_size or exceed the max_merge_size or max_merge_events thresholds.
  • One of the available files have been available for longer than the max_wait_time, then a job is created with all the currently available files.
  • The input fileset for the subscription is closed, this triggers creation of all the remaining jobs since this condition guarantees that there will not be anymore available files for the subscription.

The splitter create jobs while mantaining the size below the max thresholds and aiming to keep them above the min_merge_size except when the second or third condition is met.

WMBSMergeBySize

The WMBSMergeBySize splitter is similar to the ParentlessMergeBySize splitter however it is meant to reconstruct the parent files that were used to produce the unmerged files when preparing the merge jobs. This is used for merging the results of the EventBased splitter where the lumi boundaries are not respected in the jobs and therefore must be reconstructed afterwards.

The merge jobs in this splitter are triggered only when all the jobs that processed a parent file in the EventBased splitter are complete and successful, if any of the jobs fail then all the other unmerged outputs are discarded. This splitter has no control on the maximum size for merge jobs since it must reconstruct the parent files, however it will aim to keep the merged files about the minimum size threshold.

SiblingProcessingBased

This splitter creates jobs based on the status of sibling subscriptions, where a sibling subscription is:

"A subscription that has the same fileset as input"

This is used in cleanup subscriptions where they must wait for other subscriptions working in the same input fileset (e.g. merge subscriptions) before creating cleanup jobs. The size of the jobs is configured by the number of files (files_per_job) and a job is created only when enough files are already processed by the sibling subscription, unless the input fileset is closed which automatically triggers the jobs as soon as the siblings subscriptions are done with it.

MinFileBased

The MinFileBased is the simplest splitter of all, it creates jobs only when the available number of files is greater or equal than files_per_job. If the threshold number of files is not available then a job won't be created until the input fileset is closed.

Job creation

After the splitters run, the jobs are stored in main memory and the JobCreator registers them in WMBS. Finally, job pickled objects are created in disk under the JobCreator directory which contain basic information about the job that will be needed later by the JobSubmitter in submission.