Partitions - umccr/aws_parallel_cluster GitHub Wiki
Partitions
Spot vs On Demand instances
Different partitions have different instance configurations.
They are described below along with their expected use cases.
Spot instances are cheaper and pricing is based on the demand of AWS resources for the region for the duration of the instance.
Spot instances may spontaneously be shut-down if the resources demand becomes too high.
For this reason it is not recommended to place long-running jobs, or jobs that spawn other jobs on spot-instances.
Jobs that were on a spot-instance that has since been shut down will be requeued.
Workflow engines should be resilient to jobs requeing/restarting.
On-demand instances do not spontaneously shut down but the price is fixed and substantially higher than spot-instances.
As a general rule of thumb, use on-demand instances for the workflow engine but spot instances for the workflow steps.
Named Partitions
The names of the partitions on the scheduler
compute
- Default partition
- c5.4xlarge and m5.4xlarge instances available on this partition.
- Spot instances - scheduler must be resilient to restarting jobs.
copy
- Use
--partition=copy
to run commands through this partition - Running on an m5.large instance.
- Use for staging input and reference data and uploading output data.
- Uses spot instances, so use
--requeue
on your jobs to enable job restarts
*-long
- Use
--partition=compute-long
or--partition=copy-long
. - These partitions have the same specs as their 'spot' counter-parts by use 'ondemand' instances instead.
- These should be reserved for jobs that CANNOT be restarted such as workflow schedulers.
Instance types
m5.large
Cpus: 2 Mem: 8 GB Bandwidth: 10 Gbps Partitions: copy, copy-long
c5.4xlarge
Cpus: 16 Mem: 32 GB Bandwidth: 10 Gbps Partitions: compute, compute-long
m5.4xlarge
Cpus: 16 Mem: 64 Bandwidth: 10 Gbps Paritions: compute, compute-long