11 Historical Biogeography Analysis - PhyloAI/Ortho2Web GitHub Wiki

11.1 Dating Analysis

MCMCtree is a tool within the Phylogenetic Analysis by Maximum Likelihood (PAML) package designed for estimating divergence times on a phylogenetic tree using Bayesian methods. It allows for time calibration with fossil information, handles uncertainty in the molecular clock model, and provides posterior estimates for node ages, taking into account molecular evolution and fossil calibration constraints.

Step 1: Adding fossil calibration point information
The format for adding calibration point information should look like this:

  • The first line should list the number of species and tree, with separated by spaces.
  • The second line should contain the calibration point information, followed by a semicolon. Fossil constraints can be added as a fixed time point (e.g., @0.7 to specify a precise age) or as an interval for 95% HPD (e.g., '>.07<.08'). For example, assuming there are 7 species and a calibration point at a certain node, the file should look like:
7 1
((((A, (B, C)) '>.07<.08', D), (E, F)), G);

Step 2: Running mcmctree with usedata = 3
Set up the mcmctree configuration file to specify usedata = 3. The configuration look like:

          seed = -1
       seqfile = input.phy
      treefile = input.tre
      mcmcfile = mcmc.txt
       outfile = out.txt

         ndata = 1
       seqtype = 0    * 0: nucleotides; 1:codons; 2:AAs
       usedata = 3    * 0: no data; 1:seq like; 2:normal approximation; 3:out.BV (in.BV)
         clock = 2    * 1: global clock; 2: independent rates; 3: correlated rates
       RootAge = '<1.0'  * safe constraint on root age, used if no fossil for root.

         model = 7    * 0:JC69, 1:K80, 2:F81, 3:F84, 4:HKY85
         alpha = 0.5    * alpha for gamma rates at sites
         ncatG = 5    * No. categories in discrete gamma

     cleandata = 0    * remove sites with ambiguity data (1:yes, 0:no)?

       BDparas = 1 1 0.1  * birth, death, sampling
   kappa_gamma = 6 2      * gamma prior for kappa
   alpha_gamma = 1 1      * gamma prior for alpha

   rgene_gamma = 2 20 1   * gammaDir prior for rate for genes
  sigma2_gamma = 1 10 1   * gammaDir prior for sigma^2     (for clock=2 or 3)

      finetune = 1: .1 .1 .1 .1 .1 .1 * auto (0 or 1): times, musigma2, rates, mixing, paras, FossilErr

         print = 1   * 0: no mcmc sample; 1: everything except branch rates 2: everything
        burnin = 2000
      sampfreq = 10
       nsample = 20000
mcmctree
# This command runs mcmctree with the configuration where usedata = 3. 
# The output of this run will generate a file named out.BV. After the mcmctree run completes, rename the out.BV file to in.BV

Step 3: Running mcmctree with usedata = 2
Modify the mcmctree configuration file to specify usedata = 2.

mcmctree

Step 4: Checking convergence using Tracer
To check if the MCMC (Markov Chain Monte Carlo) process has converged, use the mcmc.txt file generated by mcmctree and analyze it using Tracer. Tracer will also show the effective sample size (ESS) for each parameter. An ESS greater than 200 is typically considered good and suggests convergence.

11.2 Ancestral Area Reconstruction

Reconstruct Ancestral State in Phylogenies (RASP) is a software tool used for reconstructing the geographic distribution or ancestral states of species on a phylogenetic tree. It is particularly popular in historical biogeography, as it allows researchers to analyze how species distributions and other traits have changed over evolutionary time, helping to infer the historical processes like dispersal, vicariance, and extinction that shaped current biodiversity patterns. RASP is easy to operate using Windows, so no detailed method description is provided here.

11.3 Diversification Analysis

RevBayes is a powerful, flexible software platform designed for Bayesian inference in evolutionary biology, particularly for phylogenetic analysis. It is known for its flexibility and modularity, allowing users to create customized models that go beyond standard phylogenetic methods. RevBayes is particularly popular for complex evolutionary questions involving divergence time estimation, trait evolution, biogeography, and even ecological modeling.

Step 1: Installation

conda install -c conda-forge cmake
conda install -c conda-forge boost-cpp
conda install -c conda-forge git
git clone --branch development https://github.com/revbayes/revbayes.git 
cd revbayes/projects/cmake 
./build.sh 
./build.sh -mpi true  #For the MPI version

Step 2: Preparing the specific RevBayes configuration file
The configuration file can be downloaded from here.

Step 3: Running

rb mcmc_EBD_king.Rev

Step 4: Visualization

conda install -c conda-forge imagemagick 
conda install -c conda-forge r-devtools
install.packages("devtools") 
devtools::install_github("cmt2/RevGadgets") 

Rscript plot_EBD.R