Scheduling Jobs - uwsph/hpcusers GitHub Wiki

Our HPC environment supports two types of computing jobs: batch and interactive. What type of job to use, and when to use it will depend on your needs. For example, if you are performing exploratory data analysis, you will want an interactive job. On the other hand, if you have a long running task that once started, will run without user input and eventually output a result, you should use a batch job.

Ideally, you should aim to work towards converting as many of your tasks to batch jobs. With batch jobs, you help maximize the cluster resource utilization, by allowing the job scheduler to start your task as soon resources are available. In other words, you don't have to be around to start / stop the job, the scheduler will do this automatically, with the goal of getting every job done as soon as it can.

Terms

Slurm: This is the job scheduler or workload manager our cluster uses.

Account: An account represents a group or user, and the cluster capacity they have purchased.

Partition: A logical group of cluster nodes. These are either a class of node, or specific hardware configuration. This is sometimes called a queue.

JOBID: Every job is assigned a unique ID number. This ID can be used for checking job status, resource usage, or managing the job.

Node: A single computer or server within the cluster.

Scheduling a Job

[!IMPORTANT] When scheduling a compute job, you must specify the amount CPU, Memory (RAM), and runtime your job will require. If your job exceeds those limits, the scheduler will terminate your job without warning.

In addition to knowing the resource requirements of your job, you'll also need to decide on the partition and account to run it under. When selecting a partition, it should best reflect the needs of the job. For example, if the job doesn't require a GPU, you should select a partition that doesn't include GPU nodes.

Common Arguments

Slurm's interactive and batch tools share common arguments for specifying job parameters.

Argument Definition
-A, --account Account. This is the lab, person, or research group that is paying for the computing resources.
-p, --partition Partition (sometimes called a queue). What type of computing hardware do you need? Lots of memory, GPU, etc?
-N, --nodes Nodes. This is the number of nodes you want the job spread across. For most interactive work, that would be 1
-c, --cpus-per-task Cores. The number of CPU cores your job needs to run on. For a serial task, this is typically a single core. For parallel tasks, this could be as many as your job can efficiently use.
--mem Memory or RAM. Specify the amount you need in (M)egabytes or (G)igabytes.
-t, --time Time. The maximum runtime for the job. Time can be selected using hours:minutes:seconds, days-hours, or minutes.

Interactive

As the name implies, interactive jobs, are jobs where you will be sitting at the computer working with the application. Typical cases are, interactive data analysis or data exploration, running software that has a graphical interface, and software / code development. When selecting resources for an interactive job, pay careful attention to the selected runtime. Try to avoid setting the value too high, if you forget to fully exit your interactive session, your job may tie up resources that other users could use.

Interactive jobs are started using the "salloc" command. For example, lets run a job under the "sph" account, on the "12c128g" partition, with 1 node, 1 CPU core, 1GB of RAM, and a max runtime of 1 hour.

salloc -A sph -p 12c128g -N 1 -c 1 --mem 1G -t 1:00:00

After running the command, you should see something like:

salloc: Granted job allocation 212
salloc: Waiting for resource configuration
salloc: Nodes sph-n01 are ready for job
jtyocum@sph-n01:~$

Batch

[!NOTE] For long running (several days or weeks) batch jobs, your job should be designed to save checkpoints that it can resume from. That way, in the event the job is terminated due resource usage, node crash, etc. you don't lose all of your work.

Batch jobs are run without user interaction. Once the job is submitted the scheduler, the system will run it as soon as the necessary resources become available. When the job completes, you'll receive a notification that the job has finished.

With Slurm, batch jobs are run using the "sbatch" command. The batch job be defined using a special shell script, that defines the resources required, any environment setup, and then runs your task (which is likely in the form of another script).

When running a batch job, the job definition script, should be in the same location as your job's other files. This will make paths easier. For example, lets run a job that gets the hostname of a single compute node (node that runs the job). We'd like to run this job under the "sph" account, on the "12c128g" partition, the task only needs a single node, and minimal memory.

  1. Within our home directory, create a folder for the job, and change into it.
mkdir nodehostname
cd nodehostname
  1. Using nano, or some other text editor create the job definition, singlenodehostname.slurm. The script will contain the following:
#!/bin/bash
#SBATCH --job-name=singlenodehostname
#SBATCH --account=sph
#SBATCH --partition=12c128g
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=1G
#SBATCH --time=00:00:30

hostname >> hostname.out
  1. Submit the job to the scheduler
sbatch singlenodehostname.slurm

After a few moments, the terminal should output something like:

Submitted batch job 213
  1. Once the job has completed, you should see the output within the job's directory.
jtyocum@sph-login0:~/nodehostname$ ls -1
hostname.out
singlenodehostname.slurm
slurm-213.out

Managing Jobs

Once a job has been submitted, you can monitor the status, cancel it, or terminate it after execution begins. For example, if you have a job that fails to exit when completed, it will continue to tie up resources until the maximum runtime is reached. However, you can terminate it manually, to free up those resources.

Via CLI (SSH)

For these examples, we'll assume you are logged into the cluster via SSH or the OnDemand Shell Access.

Job Status (squeue)

To view the status of jobs, use the squeue command. To filter the view down, you can use the --me argument to only list your jobs. Here's example:

squeue --me

This will produce an output that looks something like:

             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
               214   12c128g sys/dash  jtyocum  R       0:13      1 sph-n01

The "ST" column is the job status. There are many status codes, here are the most common:

Status Definition
R Running. The job is currently running on a node(s)
CD Completed. The job has finished.
CA Canceled. Either the user or an administrator terminated the job
F Failed. The job returned a failure code.
OOM Out of Memory. The job hit the memory limit.
TO Timed out. The job hit the maximum runtime.

Canceling a Job (scancel)

Jobs can be cancelled using the scancel command. The command can be used to terminate a running job, or to cancel one that hasn't begun execution. For example, lets terminate the job with ID 208:

scancel --me 208

Via OnDemand (Web Interface)

Through OnDemand, you can quickly view the status of any jobs that are pending execution and are currently in progress. Here's how:

  1. Login to Open OnDemand
  2. Go to the Jobs menu, and select "Active Jobs"
  3. If you have many jobs, you may need to use the "Filter" field to shorten the list.
  4. The status of each job is listed under the "Status" column. If you wish to terminate a job, simply click the red trash can icon, to terminate the job.