Slurm: Programs with many threads - nthu-ioa/cluster GitHub Wiki

Quick summary: please try limit the number of threads your program uses to be similar to the number or cores you request through Slurm. If you are using code you didn't write yourself, you can usually do this by setting environment variables (see below).

Slurm, CPUs and threads

Slurm jobs on the cluster only have access to the number of CPUs that they ask for. For example, if you request --cpus-per-task=4, your job will only be able to run 4 threads at once.

Usually it is optimal to create only as many threads as you have CPUs to work on them. In some cases it might make sense to slightly oversubscribe the CPUs. However, creating many (e.g. factors of 10) more threads than you have CPUs allocated by Slurm will lead to your job running substantially slower. All the threads will be fighting for time on the same set of CPUs and switching between them wastes a lot of time.

If you write your own multithreaded code, it should be straightforward to limit the number of threads it uses. Remember that Slurm jobs have access to environment variables like SLURM_CPUS_ON_NODE, so you don't need to hard-code resource limits into your program.

If you are using cores or libraries written by others, you should use whatever options they provide to limit their thread count. Often this is done by setting special environment variables (see section below).

Sometimes multithreaded codes try to guess how many CPUs they have access to. These guesses can be wrong, especially on many-core machines using resource managers like Slurm. It is always better to tell your program exactly how many threads it can use, rather than letting it guess. Again, you can use the information Slurm provides about the resource allocation within jobs to do this.

Until recently, on our machine, it was possible for jobs to create new threads to run on CPUs that they did not reserve through Slurm. This is no longer the case.

Check how many threads are being used by a running job

You can ssh to the node where you job is running and use top, htop or ps to check how many threads it's using. It makes sense to do this, especially if you are using code written by other people, and even more so if your code involves python libraries.

Control thread creation with environment variables

The following list should cover most common multithreading APIs. In most cases OMP_NUM_THREADS should be enough, but some python libraries (particularly those for machine learning) may require almost all of these to be set. This example assumes you have a single-instance job that you want to to use all the CPUs you have allocated.

export OMP_NUM_THREADS=$SLURM_CPUS_ON_NODE
export MKL_NUM_THREADS=$SLURM_CPUS_ON_NODE
export NUMEXPR_MAX_THREADS=$SLURM_CPUS_ON_NODE
export OPENBLAS_NUM_THREADS=$SLURM_CPUS_ON_NODE
export VECLIB_MAXIMUM_THREADS=$SLURM_CPUS_ON_NODE

As a shortcut, you can module load slurm_limit_threads. This module just sets the environment variables as above.

To be clear: if you have asked for --cpus-per-task=4 and started one task with srun, then $SLURM_CPUS_ON_NODE will be equal to 4, not the actual number of CPUs on the node (72 or 80).

Hyperthreads and --cpus-per-task

Currently, our Slurm system is set up such that it will not allocate two hyperthreads on one core to different jobs. This means that, in practice, it only makes sense to ask for even numbers of --cpus-per-task. If you ask for an odd number, Slurm will round up your request. For example, --cpus-per-task=1 will give you 2 CPUs, --cpus-per-task=3, will give you 4 CPUs, and so on.

Slurm enforces CPU affinity

Currently our Slurm is configured to bind jobs to CPUs, and to do so in a deterministic way. For some special cases this might not be optimal. There are (probably) ways to work round this if necessary, so please let the administrators know if you think this might affect the performance of your jobs.