Slurm - umccr/aws_parallel_cluster GitHub Wiki

Slurm

Using slurm

See sbatch guide for more information Example batch script file:

#!/bin/bash
#SBATCH --output %J.out
#SBATCH --error %J.err
#SBATCH --time=00:05:00
#SBATCH --partition=copy

docker run --rm ubuntu:latest echo "hello-world"

sinteractive

We have installed the sinteractive script, also used on Spartan and it should work in the same way.

The Slurm native alternative can be used as well however this should be avoided due to a AWS Parallel Cluster bug

The following command will give you a day on a compute spot-instance node.

$ sinteractive --time=1-0 --partition=compute --cpus-per-task=1

sacct

This will return all jobs in running or queued state and in table format:

$ sacct
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
2                  wrap    compute                     1    RUNNING      0:0 

Use the --parsable2 format for piping into another command

$ sacct --parsable2
JobID|JobName|Partition|Account|AllocCPUS|State|ExitCode
2|wrap|compute||1|RUNNING|0:0

Use with --format for selecting specific rows

$ sacct --parsable2 --format=JobID,State,ExitCode
JobID|State|ExitCode
2|RUNNING|0:0

squeue

Shows only the jobs that are currently running

$ squeue
 JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON) 
     2   compute     wrap ec2-user CF       2:21      1 compute-dy-c54xlarge-1

scontrol

Get more information about a job or partition

$ scontrol show job 2
JobId=2 JobName=wrap
UserId=ec2-user(1000) GroupId=ec2-user(1000) MCS_label=N/A
Priority=4294901759 Nice=0 Account=(null) QOS=normal
JobState=CONFIGURING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=00:03:04 TimeLimit=365-00:00:00 TimeMin=N/A
SubmitTime=2021-01-29T08:34:19 EligibleTime=2021-01-29T08:34:19
AccrueTime=2021-01-29T08:34:19
StartTime=2021-01-29T08:34:19 EndTime=2022-01-29T08:34:19 Deadline=N/A
SuspendTime=None SecsPreSuspend=0 LastSchedEval=2021-01-29T08:34:19
Partition=compute AllocNode:Sid=ip-10-2-0-15:24159
ReqNodeList=(null) ExcNodeList=(null)
NodeList=compute-dy-c54xlarge-1
BatchHost=compute-dy-c54xlarge-1
NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=1,mem=2000M,node=1,billing=1
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=1 MinMemoryNode=2000M MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=(null)
WorkDir=/home/ec2-user
StdErr=/home/ec2-user/slurm-2.out
StdIn=/dev/null
StdOut=/home/ec2-user/slurm-2.out
Power=
MailUser=(null) MailType=NONE
$ scontrol show partition compute
PartitionName=compute
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=YES QoS=N/A
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED
Nodes=compute-dy-c54xlarge-[1-10],compute-dy-m54xlarge-[1-10]
PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO
OverTimeLimit=NONE PreemptMode=OFF
State=UP TotalCPUs=320 TotalNodes=20 SelectTypeParameters=NONE
JobDefaults=(null)
DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED

sinfo

View the available nodes with sinfo

$ sinfo
PARTITION    AVAIL  TIMELIMIT  NODES  STATE NODELIST
compute*        up   infinite     19  idle~ compute-dy-c54xlarge-[2-10],compute-dy-m54xlarge-[1-10]
compute*        up   infinite      1   idle compute-dy-c54xlarge-1
compute-long    up   infinite     20  idle~ compute-long-dy-c54xlarge-[1-10],compute-long-dy-m54xlarge-[1-10]
copy            up   infinite     10  idle~ copy-dy-m5large-[1-10]
copy-long       up   infinite      1 alloc# copy-long-dy-m5large-1
copy-long       up   infinite      9  idle~ copy-long-dy-m5large-[2-10]

Requeue vs no-requeue

By default jobs are allowed to requeue. There may be a reason you would prefer a job to fail rather than to requeue.
In this case you may use --no-requeue in your sbatch launch script to prevent a job from restarting.
This can also be done whilst a job is running with scontrol update jobid=$jobid requeue=0.

Propagate

This, IMO, is a particularly bad default. Without setting --propagate=NONE in the sbatch script, ulimits for a given job are set to that of the submission node, rather than the execution node. Please use --propagate=NONE

Limitations

The current cluster and scheduler (SLURM) run with minimal configuration, so there will be some limitations. Known points include:

  • --mem option is not natively supported:
    • Whilst it can be used, there is no slurm controller enforcing memory.
    • Since you are probably the only one using the cluster, please do not exploit this or forever suffer the consequences.
    • If using Docker you may use --memory to resolve this. CWLtool also allows for this option with "--strict-memory-limit".
    • This will prevent jobs exceeding their allocated memory usage.

Debugging your slurm script

Run scontrol write batch_script <job_id> to generate a shell script of your job.
This may help you determine the issue.

⚠️ **GitHub.com Fallback** ⚠️