RAxML on OpenStack - cdoorenweerd/PhylOStack GitHub Wiki

This HOWTO explains how to use the AVX2 PTHREADS variant of the latest version of RAxML available on GitHub on OpenStack with Ubuntu 14.04 LTS, including optional searches for Rogue taxa using RogueNaRok v1.0. For more information on both packages see <www.exelixis-lab.org>.

Note: This HOWTO assumes you have installed the PhylOStack and know how to connect via SSH, transfer files and use screen sessions.

Preparing input data

RAxML only requires an alignment file as input, but a file defining partitions can additionally be provided. All the analysis parameters are set in the starting command.

  1. Mandatory: an alignment in relaxed phylip format, named alignment.phy in the examples
  2. Optionally a partitions text file, named partitions.txt in the examples

Your partitions file should look something like this:

DNA, 28SNep_921 = 1-921
DNA, CAD2_415 = 922-1336
DNA, COI-COII_codon3 = 1337-2001, 2002-2637\3
DNA, COI_codon12 = 1338-2001\3, 1339-2001\3
DNA, COII_codon12 = 2003-2637\3, 2004-2637\3
DNA, EF1-alpha_Nep = 2638-3119
DNA, Histon3 = 3120-3447
DNA, IDH_MDH = 3448-4170, 4171-4576

Running an analysis

RAxML is quite versatile and comes with many options. Below are some examples of commands you might want to use. In all cases, the input file needs to be specified with -s, the partitions file is specified with -q [remove if not used] and make sure the number of VCPU cores to be used is set correctly with -T. The parameter -n specifies a label that will be added to the resulting files, which is useful for separating the result files from different analyses on the same input files. Feel free to change or add any other parameters, but you might want to consult the manual when doing so. A log file of the screen output will be kept in STDOUT.log where you can follow intermediate results. The bootstrap tree files contain lists of trees in Newick format.

  • To start an analysis, connect to the instance, open a Screen session and navigate to the folder with the input files. Then start RAxML with one of the following commands:

Best tree

raxmlHPC -D -p 1234 -s alignment.phy -q partitions.txt -m GTRCAT -U -T 8 -n Best > STDOUT.Best.log

Will calculate the best maximum likelihood tree, using the -D parameter to speed up the process based on Robinson-Fould distances, which is recommended for large datasets (>500 taxa) and the -U parameter to save RAM consumption for datasets with missing data. The resulting tree file will be called: RAxML_bestTree.Best, which you can open with e.g. FigTree. Note that if you want to do repeated searches for the best tree, you have to change the seed number specified with -p between the runs, or your results will be surprisingly similar (identical).

Best tree with rapid bootstrap supports

raxmlHPC -f a -p 1234 -s alignment.phy -q partitions.txt -x 1234 -# autoMRE -m GTRCAT -U -T 8 -n BestRapidBS > STDOUT.BestRapidBS.log

Will calculate the best maximum likelihood tree, calculate rapid bootstrap values, automatically stop bootstrapping following the extended majority rule (MRE) criterium, and plot the bootstrap values on the best tree. The tree with the combined information will be named RAxML_bipartitions.BestRapidBS, which you can open with e.g. FigTree. The bootstrap trees file is also retained and called RAxML_bootstrap.BestRapidBS.

Best tree with multiparametric bootstrap supports

This cannot be done with a single command. However, the necessary commands can be combined, separated by &&. This instructs Linux to execute the commands in series and only continues after the previous command was succesful, meaning that it will stop all combined commands when an error occurs. An example for combining three commands in a pipeline that will produce a single best tree with multiparametric bootstrap values on the branches is given below. Be sure to change the -s and -q input file parameters in two places.

raxmlHPC -D -p 1234 -s alignment.phy -q partitions.txt -m GTRCAT -U -T 8 -n Best > STDOUT.Best.log && raxmlHPC -b 1234 -D -p 1324 -s alignment.phy -q partitions.txt -# autoMRE -m GTRCAT -U -T 8 -n Bootstraps > STDOUT.Bootstraps.log && raxmlHPC -f b -t RAxML_bestTree.Best -z RAxML_bootstrap.Bootstraps -m GTRCAT -T 8 -n BestBS > STDOUT.BestBS.log

Will first search for the best tree, which will be saved as RAxML_bestTree.Best. Secondly, it will perform a multiparametric bootstrap search that automatically stops based on the extended majority rule criterium (MRE), the bootstrap trees file will be saved as RAxML_bootstrap.Bootstraps. Thirdly, the bootstrap values are plotted on the best tree and saved as RAxML_bipartitions.BestBS.

Bootstrap consensus

It may be useful to analyse the bootstrap results separately, i.e. without plotting the results on a ML best tree. The following command can be combined with a bootstrap run command directly, separated by &&, or used on earlier obtained bootstrap tree files. The bootstrap trees input file is specified with -z, the consensus type is specified with -J, where -J STRICT results in a strict consensus, -J T_50 results in a 50% majority rule consensus [the cut-off percentage can be varied between 1-99] and -J MRE creates an extended majority rule consensus. Example:

raxmlHPC -J STRICT -z RAxML_bootstrap.Bootstraps -m GTRCAT -n STRICT > STDOUT.consensus.log

Identify rogue taxa with RogueNaRok

RogueNaRok is installed with the RAxML installation script and can be run directly on RAxML bootstrap result files, using the command RNR. The bootstrap trees file is specified with -i. The analysis type is specified with -c, where -c 50 uses a 50% majority rule (MR) threshold, -c 100 uses be a strict consensus threshold and -c MRE uses the extended majority rule (MRE) consensus. An example is given below that will provide a tab delimited list of Rogue taxa in the file RogueNaRok_droppedRogues.MR50.

RNR -i RAxML_bootstrap.Bootstraps -c 50 -T 8 -n MR50
⚠️ **GitHub.com Fallback** ⚠️