Exabayes on OpenStack - cdoorenweerd/PhylOStack GitHub Wiki

This HOWTO explains how to install and use Exabayes v1.4.1 on OpenStack with Ubuntu 14.04 LTS.

Note: This HOWTO assumes you have an instance with the PhylOStack installed and you know how to connect via SSH, transfer files and use screen sessions.

Preparing input data

Exabayes requires two mandatory files to run:

  1. an alignment file in relaxed phylip format, in the example named alignment.phy
  2. a configuration file in nexus format, in the example named config.nex

and one optional if you wish to set partitions:

  1. a text file, in the example named exabayespartitions.txt. Note that the format of the partitionsfile in v1.5 has changed compared to previous versions: use foward slashes instead of backslashes.

Configuration file

You can use the example below as a template.

 #NEXUS

 [ In this config file everything is commented out.  Remove the square
   brances to activate an option. ]

 [ In most instances, it is convenient to use the scientific notations also
for integer variables (e.g., write 1e6 for requesting numgen=1000000) ]

 [ all keywords are case-insensitive ]


 [================================================================]
 [                               GENERAL                          ]
 [================================================================]
 begin RUN;
  [ RUN PARAMETERS ]
   numRuns  2                        [ number of independant runs, 2 is good for initial runs, for publication-worthy results use 4 runs. ]
   numGen 1e6                        [ number of generations to run. If a convergence criterion is used, this is the minimum number of generations to run.  ]
   diagFreq 5000             [ check for convergence of multiple independent runs after this many generations ]
   samplingFreq 500                  [ take a sample every n generations ]
   tuneFreq 100              [ tune move parameters every n generations, set to 0, if you want to disable parameter tuning ]
   printFreq 100             [ print a sample to the screen every n generations, set to 0 if you want to disable output info about the state of the chains          ]

  [ parsimonyStart true ]    [ if true: use parsimony trees as starting trees, else: use random starting trees ]
  [ heatedChainsUseSame  false ]   [ should heated chains start with the same tree as the cold chain? default: false]
  [ checkPointInterval   10000 ]       [ defines how often a checkpoint is written ]

  [ MCMCMC ]
   numCoupledChains  4               [ number of chains per independent run (only one is cold)  => must be > 0]
  [ heatFactor 0.1 ]         [ the heat increment added to each hotter chain (this is not the inverse heat)   ]
  [ numSwapsPerGen 1 ]           [ number of swaps attempted on average per generation ]

  [ CONVERGENCE ]
  [ convergenceCriterion mean ]  [ indicates whether a topological convergence criterion should be used. Valid values: none, mean (default), max ]
  [ sdsfIgnoreFreq  0.1 ]        [ ignore clades for which the relative frequency does not exceed this value in any chain (here 10%) ]
   sdsfConvergence 0.02        [ consider runs converged (and stop) once the asdsf or msdsf is below this value (here 1%): 1-5% is considered good,     0.5% can be considered very good convergence ]

  [ BURNIN ]
  [ burninGen 2000 ]             [ exact number of generations that are discarded for diagnostics ]
   burninProportion      0.25        [ discard this proportion of all initial samples as burnin  ]

  [ proposalSets     false ]      [ enables accumulation of proposal into proposalSets. Very IMPORTANT, if you run an analysis using multiple unlinked     parameters ]
 end;


 begin params;
 [================================================================]
 [ PARAMETERS: definition and linking ]
 [================================================================]

 [ stateFreq = (0-3)   ]    [ state frequencies ]
 [ rateHet   = (0-3)   ]    [ rate heterogeneity among sites ]
 [ revMat    = (0-3)   ]    [ reversible substitution rate matrix ]
 [ brlens    = (0-3)   ]    [ branch lengths are linked across all partitions by default. Unlinking brlens is bugged in 1.4.1, it will crash your run. If you want to unlink brlens, download and compile v1.4.2 https://www.dropbox.com/s/b7x9zrytesfg9af/exabayes-1.4.2.tar.gz?dl=0 or wait for 1.5 ]
 [ aaModel   = (0-3)   ]    [ fixed-rate amino acid substitution matrix ]

 [ syntax for unlinking: ]
 [ brlens = (0-3)      ]    [ linked: one branch length parameter, linked across all 4 partitions ]
 [ brlens = (0,1,2,3)  ]    [ unlinked: 4 branch length parameters (one for each partition)  ]
 [ brlens = (0+1,2,3)  ]    [ the first parameter is linked for partitions 0 and 1 ]
 [ brlens = (0-2,3)    ]    [ first parameter linked across the first 3 partiions, a separate parameter for the last partition ]

 end;


 [================================================================]
 [ PRIOR configuration ]
 [================================================================]
 begin PRIORS;
 [ TOPOLOGY: fixed or uniform  ]
 [   topoPr fixed()      ]
    topoPr uniform(0,0)


 [ BRANCH LENGTHS: uniform, exponential or fixed ]
 [ brlenpr uniform(1e-6,4)     ]
 [ brlenpr fixed()             ]      [ you should have provided a user tree with branch lengths, otherwise 0.1 ]
 [ brlenpr fixed(0.1)          ]      [ fixes branch lengths to 0.1 ]
  brlenpr exponential(100)            [ exponential distribution with mean value 1 ]

 [ SUBSTITUTION MATRIX: dirichlet or fixed, 6 values for DNA, 190 for AA (you must have defined a revMat parameter for the respective partitions) ]
 [ order of the substitution rates corresponds to the order in the ExaBayes_parameters* output file ]
 [ revMatPr dirichlet(1,2,3,4,5,6) ]            [ for DNA (rates are relative to each other and will be normalized, s.t. they sum up to 1) ]
 [ revMatPr dirichlet( 1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,
                       1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,
                           1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,
                               1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,
                                   1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,
                                       1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,
                                           1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,
                                               1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,
                                                   1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,
                                                       1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,
                                                           1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,
                                                               1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,
                                                                   1.0,1.0,1.0,1.0,1.0,1.0,1.0,
                                                                       1.0,1.0,1.0,1.0,1.0,1.0,
                                                                           1.0,1.0,1.0,1.0,1.0,
                                                                               1.0,1.0,1.0,1.0,
                                                                                   1.0,1.0,1.0,
                                                                                       1.0,1.0,
                                                                                           1.0 ) ]     [ 189 values for AA]
 [ revMatPr fixed(1,2,3,4,5,6) ]                [ for DNA, rates will be normalized  ]

 [ STATE FREQUENCIES: dirichlet or fixed, 4 values for DNA, 190 for AA (you must have defined a stateFreq parameter for the respective partitions) ]
 [ order of the frequecies corresponds to the order in the ExaBayes_parameters* output file ]
   statefreqpr  dirichlet(0.25,0.25,0.25,0.25)                                                             [ for DNA,  could also be relative rates     again  ]
 [  statefreqpr dirichlet(1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0) ] [ for AA  ]
 [  stateFreqPr fixed(1,2,3,4)                   ]

 [ RATE HETEROGENEITY: alpha parameter of gamma distribution, exponential, fixed or uniform ]
 [ shapePr exponential(2) ]
 [ shapePr uniform(0.0000001,200) ]
 [ shapePr fixed(1)   ]

 [ fixed-rate MODELS for AMINO ACID partitions, fixed or specified ]
 [ aaPr disc( WAG=3, REMAINDER=1 ) ]            [ high prior belief in WAG, all remaining models have a lower prior belief ]
 [ below a complete list of AA models implemented in ExaBayes ]
 [ aaPr disc(  DAYHOFF=1, DCMUT=1, JTT=1, MTREV=1, WAG=1,
                   RTREV=1, CPREV=1, VT=1, BLOSUM62=1, MTMAM=1,
           LG=1, MTART=1, MTZOA=1, PMB=1, HIVB=1, HIVW=1,
           JTTDCMUT=1, FLU=1) ]
 [ aaPr fixed(WAG) ]                            [ fix the model to WAG ]

 end;


  [================================================================]
  [ PROPOSAL configuration ]
  [================================================================]
 begin proposals;
  [ each proposal has a relative weight, use 0 to deactivate ]

  [ for disabling a category, please set it to fixed in the prior
    section rather than to disable all proposals ]

   [ TOPOLOGY proposals  ]
   [ eTBR         5 ]
   [ eSPR         5 ]
   [ stNNI        5 ]
   [ parsimonySPR 5 ]

   [ BRANCH LENGTHS proposals ]
   [ branchMulti    15 ]
   [ treeLengthMult 2  ]
   [ nodeSlider     5  ]

   [ RATE HETEROGENEITY ]
   [ rateHetMulti   1  ]

   [ SUBSTITUTION RATES  ]
   [ revMatSlider       0.5 ]    [ for DNA ]
   [ revMatDirichlet    0.5 ]    [ for DNA ]
   [ revMatRateDirich    2  ]    [ for AA  ]

   [ [ STATE FREQUENIES ]    ]
   [ frequencySlider    0.5  ]
   [ frequencyDirichlet 0.5  ]

   [ fixed-rate AA matrices ]
   [ aaModelJump        1 ]


   [ PARAMETERS for various moves ]
   [ eSprStopProb 0.5   ]            [ stopping probability for eSPR moves ]
   [ etbrStopProb 0.5   ]            [ stopping probability for eTBR moves ]
   [ parsimonyWarp 0.10 ]            [ warp factor for parsimony based moves ]
   [ parsSPRRadius  10  ]            [ rearrangement radius (around current position) for parsimony SPR ]

   [================================================================]

 end ;

Defining partitions

The partitions are defined in a separate text file called exabayespartitions.txt using a format similar to that of RaxML but with forward slashes instead of backslashes to separate codon positions. For example:

DNA, 28SNep_921 = 1-921
DNA, CAD2_415 = 922-1336
DNA, COI-COII_codon3 = 1337-2001, 2002-2637/3
DNA, COI_codon12 = 1338-2001/3, 1339-2001/3
DNA, COII_codon12 = 2003-2637/3, 2004-2637/3
DNA, EF1-alpha_Nep = 2638-3119
DNA, Histon3 = 3120-3447
DNA, IDH_MDH = 3448-4170, 4171-4576

Starting a run

Place the input files in a folder and copy this folder to the instance. Start a Screen session, navigate to the folder with the input files and start a run by typing:

mpirun -np 8 exabayes -f alignment.phy -m DNA -c config.nex -q exabayespartitions.txt -n myrun -s 123 -R 2 > screenoutput.log

Where - -np specifies the number of cores to be used

  • -f specifies the name of your input file
  • -m specifies model (DNA or PROT)
  • -c specifies name of the nexus config file
  • -q specifies the partitions file
  • -n name for the prefix of the result files (mandatory)
  • -s seed number (mandatory, but can be $random)
  • -R if you run n independent analyses, use -R n

Although the manual makes it sound promising, do not use -S, it will crash your run. If you want to stop the run manually before it reaches your designated convergence (e.g. 1-5% difference), open the screen session and use CTRL+C.

Run Results

When a run has finished, manually or automatically, you can make a consensus tree of the resulting tree file(s) with the command below. You can also append this command directly after the exabayes command, separated by &&.

consense -f ExaBayes_topologies.run-0.myrun ExaBayes_topologies.run-1.myrun -n myCons

This results in both a newick and a nexus format consensus tree, indicated in the filename. Use Tracer on your local machine to examine the ESS values of the parameter files (all should be >100, better yet, >200). This is usually achieved after the runs have converged below 2%.

⚠️ **GitHub.com Fallback** ⚠️