Home - umccr/aws_parallel_cluster GitHub Wiki

UMCCR's AWS Parallel Cluster

Welcome to the AWS Parallel Cluster wiki!

Getting started

You will need to go to the releases page to download the latest version.

Head to the installation page for more information on prerequisites, installing parallel cluster.

Running parallel cluster

Activate your conda env

conda activate pcluster

Ensure you're logged into aws

aws sts get-caller-identity

Check the account value is as expected.

Start your cluster

Use the --no-rollback to ensure you can debug any issues with your first cluster.

start_cluster.py \
  --cluster-name=my-first-cluster \
  --file-system-type=efs \
  --no-rollback

This may take around 20 minutes to complete

Head to our parameter options page for more information

Staging Data

Unlike in a HPC environment, your input data is likely not available on disk.
In most cases you will need to 'stage' your data such that is accessible by all nodes.

Head to the shared file system page for more information on data staging.

Partitions Overview

There are four partitions to select from, compute, copy, compute-long, copy-long. Those with -long suffixes are 'on-demand' instances whilst the others are 'spot' instances. We recommend using -long only for long running jobs that cannot be restarted.

Head to the partitions page for more information.

Using slurm

Slurm is a HPCphiles' bread and butter. You can use slurm on AWS Parallel Cluster too!

Head to the slurm page for more information on using slurm on AWS.

Using toil

Toil is a maintained HPC/AWS compatible execution engine for CWL. It integrates with slurm to submit batch jobs to complete a workflow step.

Head to the toil page for more information on running your CWL workflow through toil.

Using cromwell

Cromwell is also running on AWS through an integrated slurm backend. Cromwell can execute WDL workflows and submit slurm jobs to complete workflow steps.

Head to the cromwell page for more information on running your WDL workflow through cromwell.

Installing new software on the cluster

You have sudo permissions. You can install anything.
Docker and conda have been setup for you which I would encourage you to use.

Walkthroughs

Head to our walkthroughs page for details guides for running workflow languages through AWS Parallel cluster.

We have walkthroughs in:

Troubleshooting

This is still very much a process in development. Please see our troubleshooting page for more information.

Development

For understanding this repo in greater detail head to the development page for some lovely diagrams.

Contributions

Our projects board will guide you on what more needs to be done.

Useful links

AWS Parallel Cluster git repo

Slurm

Toil

Cromwell