Running Snakemake on clusters and on the cloud - RomainFeron/workshop-snakemake-sibdays2020 GitHub Wiki

Could execution

Cloud execution is not covered in this workshop; we will only mention that Snakemake has built-in support for cloud execution for Kubernetes, with easy setup for Google Cloud and Amazon Web Service.

For more information, refer to the corresponding section of the official documentation.

HPC (cluster) execution

Snakemake can use a scheduler (e.g. Slurm, SGE, LSF ...) to run jobs without any changes to the workflow's implementation. In practice, for complex workflows, additional parameters - specifically resources like runtime and memory - may have to be specified in rule definitions to smoothly run on a cluster platform.

To execute Snakemake with a scheduler, specify the runtime parameter --cluster "<submit_command>". The following examples show how to execute a basic workflow with some popular schedulers:

Slurm: snakemake --cluster "sbatch"
Sun Grid Engine (SGE): snakemake --cluster "qsub"
LSF: snakemake --cluster "bsub <"

For more complex workflows, you can pass job information to the submit command with the syntax --cluster "<submit_command> {<rule_keyword>}. For instance, you can pass the value of the threads directive from a rule definition to a Slurm scheduler with the command snakemake --cluster "sbatch --cpus-per-task={threads}". All keywords / directives from a rule can be passed to the submit command this way.

When running snakemake with a scheduler, don't forget to specify the maximum number of jobs to run in parallel with the execution parameter --jobs (-j), otherwise only one job will be submitted at a time!

If you are often executing complex workflows on a cluster, it would be a good idea to make use of an execution profile to specify default values for execution parameters, and to facilitate resources management when submitting jobs for your workflow.