Nsight Systems - CliMA/slurm-buildkite GitHub Wiki

Nvidia Nsight Systems is a profiler and code analysis tool. The profiler is available on the cluster, and the viewer can be downloaded from the Nvidia website (free developer account is required).

Basic usage

It is installed on the cluster as the nsight-systems/nsight-systems/2023.2.1 module.

nsys profile <program>

See User guide for options, and NVTX.jl for instrumenting Julia code.

Profiling MPI code

nsys can profile MPI commands by specifying --trace=mpi. You also need to specify the MPI implementation: --mpi-impl=openmpi for Open MPI, or --mpi-impl=mpich for MPICH.

It can be used in two ways:

Profiler outside launcher

If all the tasks are on a single node, then you can run the profiler outside the launcher:

nsys profile <launcher> <program>

This will generate a single file (with a .nsys-rep extension). For example:

  - label: "Nsight - single node profile"
    command:
      - module load nsight-systems/2022.2.1 # or specify in agents: module block
      - nsys profile --output=report-single --trace=mpi --mpi-impl=openmpi srun <program>
    artifact_paths:
      - "report-single.nsys-rep"
    agents:
      slurm_ntasks_per_node: 3
      slurm_nodes: 1

Profiler inside launcher

If the job uses multiple nodes, then you need to use the profiler inside the launcher:

<launcher> nsys profile <program>

This will generate a file for each task: you will need to specify a unique name for each file, which can be done by --output= with using %q to interpolate an environment variable. To get the MPI rank:

  • if using srun, use %q{PMI_RANK}
  • if using Open MPI's mpiexec, use %q{OMPI_COMM_WORLD_RANK}

Note that the profiler itself incurs some overhead, which can result in intermittent pauses in the trace. This can be alleviated by allocating an extra CPU core for each task on which to run the profiler (i.e. --cpus-per-task=2), and then binding tasks to those cores:

  • if using srun, use --cpu-bind=cores
  • if using Open MPI's mpiexec, use --map-by node:PE=2 --bind-to core

For example:

  - label: "Nsight - multi node profile"
    command:
      - srun --cpu-bind=cores nsys profile --output=report-multi-%q{PMI_RANK} --trace=mpi --mpi-impl=openmpi julia --project=.ci .ci/mpi.jl
    artifact_paths:
      - "report-multi-*.nsys-rep"
    agents:
      slurm_cpus_per_task: 2
      slurm_ntasks_per_node: 2
      slurm_nodes: 2

Known issues

⚠️ **GitHub.com Fallback** ⚠️