Memory - CliMA/slurm-buildkite GitHub Wiki

Specifying memory requirements

Slurm provides three ways of specifying memory requirements for a job

where the value is specified as <number><suffix>, where the suffix is one of K (kilobytes), M (megabytes, the default), or G gigabytes.

These options be specified in the agents: block of a job (or globally for a pipeline):

      - label: "held suarez (ρθ)"
        command: "julia --project=examples examples/driver.jl"
        agents:
          slurm_mem_per_cpu: 8G

One problem that may occur with using --mem is that if multiple tasks are specified by --ntasks, then the tasks may be scheduled on any combination of nodes; with all tasks on a given node must share the same amount of memory. In this case, it may be more reliable to specify --mem-per-cpu (by default, Slurm will allocate 1 cpu per task: see --cpus-per-task to change this).

Note that --exclusive by itself does not give you access to all the memory: you also need to specify --mem=0.

Investigating memory usage

Plotting memory usage over time

The build_history script plots memory usage and elapsed time by job step over time for a given branch, plus the current build. It generates a build_history.html file which includes the plots. Note from the script that key needs to be specified to link steps across the selected builds.

Basic usage is to add the following to the end of your pipeline:

  - wait

  - label: ":chart_with_downwards_trend: build history"
    command:
      - build_history staging # name of branch to plot
    artifact_paths:
      - "build_history.html"

Basic Slurm commands

sstat gives a snapshot of the memory usage while the job is running: this is the memory currently in use, not the total/maximum memory over the whole job/step.

sacct gives a summary by querying the slurm database, however it only reports values for steps which have completed: e.g. if you call it from within a sbatch script, you can get the usage information for any previous srun steps, but not those for the batch step itself.

seff: computes some basic memory efficiency statistics based on sacct, however it appears to use some questionable heuristics to get the results for MPI jobs.

Understanding job steps

sstat and sacct report numbers for each job step:

  • batch are for programs run directly in the sbatch script
  • extern are for programs run in other ways (e.g. if you ssh to a node directly)
  • if an MPI job is launched using srun, then all operations are accounted for under a step named after the program which is launched (e.g. julia)
  • if an MPI job is launched using Open MPI's launcher mpiexec, the tasks on the first node are run under the batch step, and the remainder are run under the orted step.

See https://stackoverflow.com/a/63470885 for more details.

Profiling using HDF5

See https://slurm.schedmd.com/hdf5_profile_user_guide.html

  1. launch srun or sbatch with --profile=Task, and optionally --acctg-freq=n where n is the sampling interval in seconds (default = 30).
  2. After the job or step has completed, call sh5util -j <jobid> to create the HDF5 file.
  • if you just profile a single step (i.e. attach --profile=Task to srun instead of sbatch), then you can call sh5util in the same job, e.g.
  - label: "memory profile"
    command:
      - srun --profile=task --acctg-freq=10 julia -e 'X = ones(UInt8, 1024^3); sleep(60); println(sum(X))' # allocate 1GB
      - sh5util -j $$SLURM_JOB_ID # create HDF5 file
      - sh5util -j $$SLURM_JOB_ID -E --series=Tasks -l Node:TimeSeries # extract data to CSV
    artifact_paths:
      - "job_*.h5"
      - "extract_*.csv"
    agents:
      slurm_ntasks_per_node: 3
      slurm_mem_per_cpu: 4G
⚠️ **GitHub.com Fallback** ⚠️