Toil - umccr/aws_parallel_cluster GitHub Wiki

Toil

This guide assumes that you have read through the shared file systems page and the slurm page. You may wish to also read through the official toil docs for more information.

In order to save the variables initialised below throughout your shell, you should complete the following through a screen

Create toil directories on the sfs

Create the following directories in your shared filesystem mount (probably /efs) but could also be /fsx. Your 'SHARED_DIR' environment variable should be set from your ~/.bashrc

TOIL_JOB_STORE="${SHARED_DIR}/toil/job-store"
TOIL_WORKDIR="${SHARED_DIR}/toil/workdir"
TOIL_TMPDIR="${SHARED_DIR}/toil/tmpdir"
TOIL_LOG_DIR="${SHARED_DIR}/toil/logs"
TOIL_OUTPUTS="${SHARED_DIR}/toil/outputs"
mkdir -p "${TOIL_JOB_STORE}"
mkdir -p "${TOIL_WORKDIR}"
mkdir -p "${TOIL_TMPDIR}"
mkdir -p "${TOIL_LOG_DIR}"
mkdir -p "${TOIL_OUTPUTS}"

Activate env

You must activate the conda env first, since conda activate doesn't work in a non-interactive shell. By default, environment variables are inherited into an sbatch job.

conda activate toil

Running TOIL

%j represents the job id

sbatch --job-name "<name-of-my-workflow>" \
  --output "${TOIL_LOG_DIR}/toil.%j.log" \
  --error "${TOIL_LOG_DIR}/toil.%j.log" \
  --partition "copy-long" \
  --no-requeue \
  --wrap "\
    toil-cwl-runner \
      --jobStore \"${TOIL_JOB_STORE}/\${SLURM_JOB_ID}.log\" \
      --workDir \"${TOIL_WORKDIR}\" \
      --outdir \"${TOIL_OUTPUTS}\" \
      --writeLogs \"${TOIL_LOG_DIR}\" \
      --batchSystem slurm \
      --disableCaching true \
      --cleanWorkDir=onSuccess \
      \"my-cwl-tool.cwl\" \
      \"my-cwl-tool.input.yaml\""