Running an analysis on Albiorix - The-Bioinformatics-Group/Albiorix GitHub Wiki

Programs started on the login node of Albiorix can run for 10 minutes and are then automatically terminated by the system. All long running analyses must therefore be submitted to the queue system called SGE (short for Sun Grid Engine). SGE will then try to find the requested resources on one of the compute nodes and send the analysis there. You can interact with SGE in three ways:

  • qlogin
  • Readymade SGE scripts
  • Write your own SGE script

qlogin

This command will give you an interactive session which means that you will be logged on to a shell on one of the compute nodes. You can then start your analysis by typing in the commands you want to run.

[mtop@albiorix ~]$ qlogin
Requested time [in hours]: 4
Your job 16243 ("QLOGIN") has been submitted
waiting for interactive job to be scheduled ...
Your interactive job 16243 has been successfully scheduled.
Establishing builtin session to host compute-0-1.local ...
[mtop@compute-0-1 ~]$

This method is useful for running analyses where you want to interact with the computer directly (e.g. on courses), when transferring files to Albiorix or moving data to and from the compute nodes. You can request up to nine hours of wall clock time for a qlogin session.

Readymade SGE scripts

A number of readymade SGE scripts are already installed on Albiorix in the directory /usr/local/bin. Examples of these scripts are mb8.sh, qmb4.sh and beast.sh.

Write your own SGE script

Besides using readymade scripts, writing your own SGE scripts is the recommended way of running analyses on Albiorix. The latter method has several important benefits including better utilisation of the recourses, automatic documentation of your work as well as easier reproducibility. These scripts are easy to write and only requires two things, instructions for SGE and the command you want to run.

Here is an example where I want to run an alignment analysis using the program mafft. First I create a SGE script and name it "mafft_analysis.sge" using a text editor (a SGE script can have any name but informative names are to be preferred). The script contains the following:

#$ -cwd
#$ -S /bin/bash

mafft seq.fst > seq.aligned.fst

Rows starting with "#$" are instructions to SGE and the example above instructs the queue system to run the analysis in the "current working directory" (-cwd) and to use the shell Bash to interpret the rest of the commands in the script (-S /bin/bash). The last line contains the command to run on the compute node (mafft...). This script can then be submitted to the queue system with the following command:

[mtop@albiorix ~]$ qsub mafft_analysis.sge

Two additional SGE instructions are also good to know about; -q for selecting which queue (and compute node) to send the analysis to, and -pe mpich for specifying how many CPU to use. We can now extend the SGE script in the previous example like this:

#$ -cwd
#$ -q node0
#$ -pe mpich 4
#$ -S /bin/bash

mafft --thread 4 seq.fst > seq.aligned.fst

Note that the instruction #$ -pe mpich 4 will tell SGE to reserve four CPU cores for this analysis but we also have to instruct mafft to use these cores, and that is done with the --thread 4 option. Hence, only requesting several cores from SGE will not make the program you are running use these resources. Similarly, not requesting additional CPU cores but still instruct your program to use more then one CPU can have troublesome side effects.

More advanced usage

Running the analysis like in the example above will create some additional output files in the current working directory:

[mtop@albiorix sge]$ ls -l
total 104
-rw-r--r-- 1 mtop diatom    84 Oct 11 13:12 mafft_analysis.sge
-rw-r--r-- 1 mtop diatom  2443 Oct 11 13:12 mafft_analysis.sge.e16244
-rw-r--r-- 1 mtop diatom  1930 Oct 11 13:12 mafft_analysis.sge.o16244
-rw-r--r-- 1 mtop diatom     0 Oct 11 13:12 mafft_analysis.sge.pe16244
-rw-r--r-- 1 mtop diatom   133 Oct 11 13:12 mafft_analysis.sge.po16244
-rw-r--r-- 1 mtop diatom 53375 Oct 11 13:12 seq.aligned.fst
-rw-r--r-- 1 mtop diatom 24648 Oct 11 13:10 seq.fst
[mtop@albiorix sge]$ 

The number 16244 refers to the SGE job number that is also reported by the command qstat while the analysis is still running. The file mafft_analysis.sge.e16244 contains the standard error stream (STDERR) and mafft_analysis.sge.o16244 contains the standard out stream (STDOUT) from mafft. The files *.pe16244 and *.po16244 contains the same streams from the parallel environment (remember #$ -pe mpich 4) of SGE. These streams can be redirected to other files or automatically discarded by sending them to /dev/null. The example file now looks like this:

[mtop@albiorix sge]$ cat mafft_analysis.sge
#$ -cwd
#$ -q node0
#$ -pe mpich 4
#$ -o mafft.out
#$ -e /dev/null
#$ -S /bin/bash

mafft seq.fst > seq.aligned.fst

Change already submitted jobs

The qalter command can be used to change already submitted jobs, based on the job ID. For instance, to change the node for submission for the job with the ID 17645:

qalter -q high_mem 17645

See qalter -h for details.