The job is very slow - pb-dyim/SMRT-Analysis GitHub Wiki

Step 1: Check resource requirements

We recommend that SMRT Analysis be installed on a compute cluster with at least the following hardware specs:

1 head node:
• Minimum 16 GB RAM. Larger references such as human may require 32 GB RAM.
• Minimum 250 GB of disk space

3 compute nodes:
• 8 cores per node, with 2 GB RAM per core
• Minimum 250 GB of disk space per node

You do not meet resource requirements and are running on a single server

If SMRT Analysis is only running on a single-server, the software makes no attempt to load-balance or queue any jobs on the single server. All jobs are submitted and executed, which simultaneously slows down all other processes running on the server. You must advise your users to submit SMRT Portal jobs with restraint, preferably one-at-a-time.

You do meet resource requirements and are running on a distributed computing environment

If SMRT Analysis is configured for distributed computing, but the jobs are still running slowly you will need to edits the template file for your job management system (JMS). These files are $SEYMOUR_HOME/analysis/etc/cluster/<JMS>/start.tmpl and $SEYMOUR_HOME/analysis/etc/cluster/<JMS>/interactive.tmpl.

See below for more specific suggestions.

Step 2: Diagnose the problem in distributed computing environment

Check which jobs are stuck in the queue. For example, you can use qstat if your Job Management system is SGE.
The first column of the return will be the job id, and you can find out which node is running that job by executing qstat -j <job_id>.

If there are other, perhaps larger jobs, being submitted to the same queue, change the designated queue (-q option) to an exclusive environment and monitor the resource usage. For example, the start.tmpl file for Sun Grid Engine (SGE) looks like this:

qsub -pe <your_parallel_environment> ${NPROC} -S /bin/bash -V -q <your new_queue> -N ${JOB_ID} -o ${STDOUT_FILE} -e ${STDERR_FILE} ${EXTRAS} ${CMD}

You can also add options to limit the memory usage, for example, using the -M option for the bsub command in LSF:

bsub -q pacbio -g /pacbio/smrtanalysis -J ${JOB_ID} -o ${STDOUT_FILE} -e ${STDERR_FILE} -M 33000000 -n 4 ${CMD}

If you have a heterogeneous cluster, you can also use the -R option to specify compute nodes that meet certain resource requrements:

bsub -q pacbio -g /pacbio/smrtanalysis -J ${JOB_ID} -o ${STDOUT_FILE} -e ${STDERR_FILE} -M 33000000 -R 'select[type==LINUX64 && mem>=32000 && tmp>=300000] rusage[mem=32000, tmp=250000] span[hosts=1] -n 4 ${CMD} '

Finally, you can add any arbitrary number of operations to the job submission by adding lines to the tmpl files. In the following example, additional environment variables are being defined in a profile script, instead of being managed by the parallel environment (-pe) option:

. /path/to/profile
qsub  ${NPROC} -S /bin/bash -V -q <your new_queue> -N ${JOB_ID} -o ${STDOUT_FILE} -e ${STDERR_FILE} ${EXTRAS} ${CMD}