Nsight Systems - CliMA/slurm-buildkite GitHub Wiki
Nvidia Nsight Systems is a profiler and code analysis tool. The profiler is available on the cluster, and the viewer can be downloaded from the Nvidia website (free developer account is required).
It is installed on the cluster as the nsight-systems/nsight-systems/2023.2.1 module.
nsys profile <program>See User guide for options, and NVTX.jl for instrumenting Julia code.
nsys can profile MPI commands by specifying --trace=mpi. You also need to specify the MPI implementation: --mpi-impl=openmpi for Open MPI, or --mpi-impl=mpich for MPICH.
It can be used in two ways:
If all the tasks are on a single node, then you can run the profiler outside the launcher:
nsys profile <launcher> <program>This will generate a single file (with a .nsys-rep extension). For example:
- label: "Nsight - single node profile"
command:
- module load nsight-systems/2022.2.1 # or specify in agents: module block
- nsys profile --output=report-single --trace=mpi --mpi-impl=openmpi srun <program>
artifact_paths:
- "report-single.nsys-rep"
agents:
slurm_ntasks_per_node: 3
slurm_nodes: 1If the job uses multiple nodes, then you need to use the profiler inside the launcher:
<launcher> nsys profile <program>This will generate a file for each task: you will need to specify a unique name for each file, which can be done by --output= with using %q to interpolate an environment variable. To get the MPI rank:
- if using
srun, use%q{PMI_RANK} - if using Open MPI's
mpiexec, use%q{OMPI_COMM_WORLD_RANK}
Note that the profiler itself incurs some overhead, which can result in intermittent pauses in the trace. This can be alleviated by allocating an extra CPU core for each task on which to run the profiler (i.e. --cpus-per-task=2), and then binding tasks to those cores:
- if using
srun, use--cpu-bind=cores - if using Open MPI's
mpiexec, use--map-by node:PE=2 --bind-to core
For example:
- label: "Nsight - multi node profile"
command:
- srun --cpu-bind=cores nsys profile --output=report-multi-%q{PMI_RANK} --trace=mpi --mpi-impl=openmpi julia --project=.ci .ci/mpi.jl
artifact_paths:
- "report-multi-*.nsys-rep"
agents:
slurm_cpus_per_task: 2
slurm_ntasks_per_node: 2
slurm_nodes: 2- Certain versions can hang when writing to a file on Slurm. See https://forums.developer.nvidia.com/t/nsys-hanging-on-slurm-cluster/239975
- OS runtime tracing (default if no
--traceis specified) causes problems with Julia. Specify--tracemanually without theosrtoption. See https://forums.developer.nvidia.com/t/fail-to-launch-nsight-system-in-julia/255649/2?u=simonbyrne1