Partitions - umccr/aws_parallel_cluster GitHub Wiki

Partitions

Spot vs On Demand instances

Different partitions have different instance configurations.
They are described below along with their expected use cases. Spot instances are cheaper and pricing is based on the demand of AWS resources for the region for the duration of the instance.
Spot instances may spontaneously be shut-down if the resources demand becomes too high.
For this reason it is not recommended to place long-running jobs, or jobs that spawn other jobs on spot-instances.
Jobs that were on a spot-instance that has since been shut down will be requeued.
Workflow engines should be resilient to jobs requeing/restarting.
On-demand instances do not spontaneously shut down but the price is fixed and substantially higher than spot-instances.
As a general rule of thumb, use on-demand instances for the workflow engine but spot instances for the workflow steps.

Named Partitions

The names of the partitions on the scheduler

compute

  • Default partition
  • c5.4xlarge and m5.4xlarge instances available on this partition.
  • Spot instances - scheduler must be resilient to restarting jobs.

copy

  • Use --partition=copy to run commands through this partition
  • Running on an m5.large instance.
  • Use for staging input and reference data and uploading output data.
  • Uses spot instances, so use --requeue on your jobs to enable job restarts

*-long

  • Use --partition=compute-long or --partition=copy-long.
  • These partitions have the same specs as their 'spot' counter-parts by use 'ondemand' instances instead.
  • These should be reserved for jobs that CANNOT be restarted such as workflow schedulers.

Instance types

m5.large

Cpus: 2 Mem: 8 GB Bandwidth: 10 Gbps Partitions: copy, copy-long

c5.4xlarge

Cpus: 16 Mem: 32 GB Bandwidth: 10 Gbps Partitions: compute, compute-long

m5.4xlarge

Cpus: 16 Mem: 64 Bandwidth: 10 Gbps Paritions: compute, compute-long