Configuration - PacificBiosciences/pypeFLOW GitHub Wiki

Job submission options (pypeflow>=2.0.0)

You are probably using pypeFLOW via FALCON. You can learn about FALCON pypeflow-configuration in the FALCON wiki.

The most important configuration is for job-distribution, via one of our "Process-Watchers".

Here, we provide some useful job-submission strings for various grid-computing systems.

"blocking" process-watcher

This process-watcher is the easiest to configure, by far. You have full control over how jobs are submitted, via the submit string.

The following variables will be substituted into your string (based on conventions in PacBio's pbsmrtpipe):

${JOB_SCRIPT} -- the shell command to run (aka CMD)
${JOB_NAME} -- job-name selected by pypeflow (not the id generated by qsub, e.g.)
${JOB_STDOUT} -- path to write stdout (aka STDOUT_FILE)
${JOB_STDERR} -- path to write stderr (aka STDERR_FILE)
${NPROC} -- number of processors per job
${MB} -- maximum MegaBytes of RAM per processor (total is $(expr ${NPROC} * ${MB}))

local

To force everything to run locally, just "submit" to "bash":

pwatcher_type = blocking
submit = /bin/bash -c "${JOB_SCRIPT}" > "${JOB_STDOUT}" 2> "${JOB_STDERR}"

Or to combine everything into the top stderr/stdout:

pwatcher_type = blocking
job_queue = bash -C ${CMD}
# By dropping STD*_FILE, we see all output on the console.
# That helps debugging in TravisCI/Bamboo.

SGE/qsub

This should be familiar to anyone who uses qsub regularly.

submit =  qsub -S /bin/bash -sync y -V -q myqueue
  -N ${JOB_NAME}        \
  -o "${JOB_STDOUT}" \
  -e "${JOB_STDERR}" \
  -pe smp ${NPROC}    \
  -l h_vmem=${MB}M    \
  "${JOB_SCRIPT}"

The -sync y makes it a blocking call.

(Note: -l h_vmem= and -l mem= are problematic on some systems. YMMV.)

hermit

submit = hermit qsub 
  -N ${JOB_NAME} \
  -l nprocs=${NPROC}:mem=${MB} \
  -v ${env} \
  ${JOB_SCRIPT}

LSF

submit = bsub -K -q myqueue -J ${JOB_NAME} -o ${JOB_STDOUT} -e ${JOB_STDERR} ${JOB_SCRIPT}

The -K will make it a blocking call.

PBS

Include -W block=T or -W block=true

(If you cannot use blocking mode, then you might be able to rely on qdel ${JOB_NUM}, which we will try to fill-in from the result of qsub. This is experimental, as we cannot test PBS ourselves.)

Slurm/sbatch

Try using srun instead of sbatch.

submit = srun --wait=0 -p myqueue  \
    -J ${JOB_NAME}             \
    -o ${JOB_STDOUT}        \
    -e ${JOB_STDERR}        \
    --mem-per-cpu=${MB}M     \
    --cpus-per-task=${NPROC} \
    ${JOB_SCRIPT}

Other possible flags (and maybe via sbatch):

--time=3-0
--ntasks 1 --exclusive

https://github.com/PacificBiosciences/pbbioconda/issues/109#issuecomment-477497638

Other configurables

Running on a local disk

use_tmpdir = true

# Or if you want a specific root directory,
use_tmpdir=/scratch

But note that running in a tmpdir might hinder your debugging. It is better to do this only as an optimization, after all else is working.