Memory - CliMA/slurm-buildkite GitHub Wiki
Slurm provides three ways of specifying memory requirements for a job
-
--mem: memory per node.--mem=0to get all memory on the node. --mem-per-cpu--mem-per-gpu
where the value is specified as <number><suffix>, where the suffix is one of K (kilobytes), M (megabytes, the default), or G gigabytes.
These options be specified in the agents: block of a job (or globally for a pipeline):
- label: "held suarez (ρθ)"
command: "julia --project=examples examples/driver.jl"
agents:
slurm_mem_per_cpu: 8GOne problem that may occur with using --mem is that if multiple tasks are specified by --ntasks, then the tasks may be scheduled on any combination of nodes; with all tasks on a given node must share the same amount of memory. In this case, it may be more reliable to specify --mem-per-cpu (by default, Slurm will allocate 1 cpu per task: see --cpus-per-task to change this).
Note that --exclusive by itself does not give you access to all the memory: you also need to specify --mem=0.
The build_history script plots memory usage and elapsed time by job step over time for a given branch, plus the current build. It generates a build_history.html file which includes the plots. Note from the script that key needs to be specified to link steps across the selected builds.
Basic usage is to add the following to the end of your pipeline:
- wait
- label: ":chart_with_downwards_trend: build history"
command:
- build_history staging # name of branch to plot
artifact_paths:
- "build_history.html"sstat gives a snapshot of the memory usage while the job is running: this is the memory currently in use, not the total/maximum memory over the whole job/step.
sacct gives a summary by querying the slurm database, however it only reports values for steps which have completed: e.g. if you call it from within a sbatch script, you can get the usage information for any previous srun steps, but not those for the batch step itself.
seff: computes some basic memory efficiency statistics based on sacct, however it appears to use some questionable heuristics to get the results for MPI jobs.
sstat and sacct report numbers for each job step:
-
batchare for programs run directly in thesbatchscript -
externare for programs run in other ways (e.g. if yousshto a node directly) - if an MPI job is launched using
srun, then all operations are accounted for under a step named after the program which is launched (e.g.julia) - if an MPI job is launched using Open MPI's launcher
mpiexec, the tasks on the first node are run under thebatchstep, and the remainder are run under theortedstep.
See https://stackoverflow.com/a/63470885 for more details.
See https://slurm.schedmd.com/hdf5_profile_user_guide.html
- launch
srunorsbatchwith--profile=Task, and optionally--acctg-freq=nwherenis the sampling interval in seconds (default = 30). - After the job or step has completed, call
sh5util -j <jobid>to create the HDF5 file.
- if you just profile a single step (i.e. attach
--profile=Tasktosruninstead ofsbatch), then you can callsh5utilin the same job, e.g.
- label: "memory profile"
command:
- srun --profile=task --acctg-freq=10 julia -e 'X = ones(UInt8, 1024^3); sleep(60); println(sum(X))' # allocate 1GB
- sh5util -j $$SLURM_JOB_ID # create HDF5 file
- sh5util -j $$SLURM_JOB_ID -E --series=Tasks -l Node:TimeSeries # extract data to CSV
artifact_paths:
- "job_*.h5"
- "extract_*.csv"
agents:
slurm_ntasks_per_node: 3
slurm_mem_per_cpu: 4G