CPU Binding - aws/aws-ofi-nccl GitHub Wiki
When running workloads using NCCL and this plugin, we recommend disabling CPU binding, so that every process on the node has a CPU mask that includes all CPUs.
When running with Open MPI, this is achieved by using the --bind-to none
argument to mpirun
. In Slurm, this is achieved using a combination of two things:
-
Request enough processors-per-task for your job, using either
-c $((TOTAL_PROCS/PROCS_PER_NODE))
, or using the--exclusive
flag, which requests all processors on all nodes of the job -
Disabling CPU binding, using the
--cpu-bind=none
option to srun, or the--bind-to none
option to mpirun