BAMM on OpenStack - cdoorenweerd/PhylOStack GitHub Wiki

This HOWTO explains how to use BAMM for a speciation-extinction analyses on a chronogram on OpenStack with Ubuntu 14.04 LTS. For more information, see bamm-project.org

Note: This HOWTO assumes you have installed BAMM and R with the BAMMtools package using the PhylOStack instructions and know how to connect via SSH, transfer files and use screen sessions.

Preparing input data

To run BAMM and obtain output from BAMMtools, four to five input files are required:

  1. An ultrametric tree in newick format, preferably with branch lengths corresponding with age in million years, for example obtained with BEAST. It is advisable to remove outgroups. The tree file is named finaltree_noog.tre in the examples
  2. A BAMMtools R script to set the priors, named BAMMsetpriors.R, see example below
  3. A BAMM configuration text file, named BAMM_configuration.txt, see example below
  4. A BAMMtools R script to analyse the BAMM results and create PDF plots, named BAMMplots.R, see example below
  5. Optionally, a Linux breaks tab delimited text file with sampling fractions for different clades (see http://bamm-project.org/advanced.html#accounting-for-non-random-incomplete-taxon-sampling-in-diversification-studies for details how to format this file)

BAMMsetpriors.R

library(BAMMtools)
tree <- read.tree("finaltree_noog.tre")
is.ultrametric(tree) # check for ultrametric
is.binary.tree(tree) # check if there are no polytomies
min(tree$edge.length) # check all branch lengths are > 0
setBAMMpriors(tree)
q()

BAMM_configuration.txt

Be sure to check the sampling fractions settings.

# BAMM configuration file for speciation/extinction analysis
# ==========================================================
#
# Format
# ------
#
#     - Each option is specified as: option_name = option_value
#     - Comments start with # and go to the end of the line
#     - True is specified with "1" and False with "0"


################################################################################
# GENERAL SETUP AND DATA INPUT
################################################################################

modeltype = speciationextinction
# Specify "speciationextinction" or "trait" analysis

treefile = %%%%
# File name of the phylogenetic tree to be analyzed

runInfoFilename = run_info.txt
# File name to output general information about this run

sampleFromPriorOnly = 0
# Whether to perform analysis sampling from prior only (no likelihoods computed)

runMCMC = 1
# Whether to perform the MCMC simulation. If runMCMC = 0, the program will only
# check whether the data file can be read and the initial likelihood computed

simulatePriorShifts = 1
# Whether to simulate the prior distribution of the number of shift events,
# given the hyperprior on the Poisson rate parameter. This is necessary to
# compute Bayes factors

loadEventData = 0
# Whether to load a previous event data file

eventDataInfile = event_data_in.txt
# File name of the event data file to load, used only if loadEventData = 1

initializeModel = 1
# Whether to initialize (but not run) the MCMC. If initializeModel = 0, the
# program will only ensure that the data files (e.g., treefile) can be read

useGlobalSamplingProbability = 1
# Whether to use a "global" sampling probability. If False (0), expects a file
# name for species-specific sampling probabilities (see sampleProbsFilename)

globalSamplingFraction = 1.0
# The sampling probability. If useGlobalSamplingFraction = 0, this is ignored
# and BAMM looks for a file name with species-specific sampling fractions

sampleProbsFilename = sample_probs.txt
# File name containing species-specific sampling fractions

# seed = 12345
# Seed for the random number generator.
# If not specified (or is -1), a seed is obtained from the system clock

overwrite = 0
# If True (1), the program will overwrite any output files in the current
# directory (if present)


################################################################################
# MCMC SIMULATION SETTINGS & OUTPUT OPTIONS
################################################################################

numberOfGenerations = 5000000
# Number of generations to perform MCMC simulation

mcmcOutfile = mcmc_out.txt
# File name for the MCMC output, which only includes summary information about
# MCMC simulation (e.g., log-likelihoods, log-prior, number of processes)

mcmcWriteFreq = 1000
# Frequency in which to write the MCMC output to a file

eventDataOutfile = event_data.txt
# The raw event data (these are the main results). ALL of the results are
# contained in this file, and all branch-specific speciation rates, shift
# positions, marginal distributions etc can be reconstructed from this output.
# See R package BAMMtools for working with this output

eventDataWriteFreq = 1000
# Frequency in which to write the event data to a file

printFreq = 1000
# Frequency in which to print MCMC status to the screen

acceptanceResetFreq = 1000
# Frequency in which to reset the acceptance rate calculation
# The acceptance rate is output to both the MCMC data file and the screen

# outName = BAMM
# Optional name that will be prefixed on all output files (separated with "_")
# If commented out, no prefix will be used


################################################################################
# OPERATORS: MCMC SCALING OPERATORS
################################################################################

updateLambdaInitScale = 2.0
# Scale parameter for updating the initial speciation rate for each process

updateLambdaShiftScale = 0.1
# Scale parameter for the exponential change parameter for speciation

updateMuInitScale = 2.0
# Scale parameter for updating initial extinction rate for each process

updateEventLocationScale = 0.05
# Scale parameter for updating LOCAL moves of events on the tree
# This defines the width of the sliding window proposal

updateEventRateScale = 4.0
# Scale parameter (proportional shrinking/expanding) for updating
# the rate parameter of the Poisson process


################################################################################
# OPERATORS: MCMC MOVE FREQUENCIES
################################################################################

updateRateEventNumber = 0.1
# Relative frequency of MCMC moves that change the number of events

updateRateEventPosition = 1
# Relative frequency of MCMC moves that change the location of an event on the
# tree

updateRateEventRate = 1
# Relative frequency of MCMC moves that change the rate at which events occur

updateRateLambda0 = 1
# Relative frequency of MCMC moves that change the initial speciation rate
# associated with an event

updateRateLambdaShift = 1
# Relative frequency of MCMC moves that change the exponential shift parameter
# of the speciation rate associated with an event

updateRateMu0 = 1
# Relative frequency of MCMC moves that change the extinction rate for a given
# event

updateRateLambdaTimeMode = 0
# Relative frequency of MCMC moves that flip the time mode
# (time-constant <=> time-variable)

localGlobalMoveRatio = 10.0
# Ratio of local to global moves of events


################################################################################
# INITIAL PARAMETER VALUES
################################################################################

lambdaInit0 = 0.032
# Initial speciation rate (at the root of the tree)

lambdaShift0 = 0
# Initial shift parameter for the root process

muInit0 = 0.005
# Initial value of extinction (at the root)

initialNumberEvents = 0
# Initial number of non-root processes


################################################################################
# METROPOLIS COUPLED MCMC
################################################################################

numberOfChains = 4
# Number of Markov chains to run

deltaT = 0.01
# Temperature increment parameter. This value should be > 0
# The temperature for the i-th chain is computed as 1 / [1 + deltaT * (i - 1)]

swapPeriod = 1000
# Number of generations in which to propose a chain swap

chainSwapFileName = chain_swap.txt
# File name in which to output data about each chain swap proposal.
# The format of each line is [generation],[rank_1],[rank_2],[swap_accepted]
# where [generation] is the generation in which the swap proposal was made,
# [rank_1] and [rank_2] are the chains that were chosen, and [swap_accepted] is
# whether the swap was made. The cold chain has a rank of 1.


################################################################################
# NUMERICAL AND OTHER PARAMETERS
################################################################################

minCladeSizeForShift = 1
# Allows you to constrain location of possible rate-change events to occur
# only on branches with at least this many descendant tips. A value of 1
# allows shifts to occur on all branches.

segLength = 0.02
# Controls the "grain" of the likelihood calculations. Approximates the
# continuous-time change in diversification rates by breaking each branch into
# a constant-rate diversification segments, with each segment given a length
# determined by segLength. segLength is in units of the root-to-tip distance of
# the tree. So, if the segLength parameter is 0.01, and the crown age of your
# tree is 50, the "step size" of the constant rate approximation will be 0.5.
# If the value is greater than the branch length (e.g., you have a branch of
# length < 0.5 in the preceding example) BAMM will not break the branch into
# segments but use the mean rate across the entire branch.


################################################################################
# PRIORS
################################################################################

expectedNumberOfShifts = 1.0
# prior on the number of shifts in diversification
# Suggested values:
#     expectedNumberOfShifts = 1.0 for small trees (< 500 tips)
#     expectedNumberOfShifts = 10 or even 50 for large trees (> 5000 tips)

lambdaIsTimeVariablePrior = 1
# Prior (probability) of the time mode being time-variable (vs. time-constant)

BAMMplots.R

# load libraries
library(BAMMtools)
library(geiger)
library(coda)

# load tree
tree <- read.tree("finaltree_noog.tre")

# ESS after 20% burnin
mcmcout <- read.csv("BAMM_mcmc_out.txt")
burnstart <- floor(0.2*nrow(mcmcout))
postburn <- mcmcout[burnstart:nrow(mcmcout), ]
effectiveSize(postburn$N_shifts)
effectiveSize(postburn$logLik)

# load BAMM main output
edata <- getEventData(tree, eventdata = "BAMM_event_data.txt", burnin=0.2)
summary(edata)

# best scenario plot, dimensions in inches
pdf("BAMM_best.pdf", width=7, height=10)
best <- getBestShiftConfiguration(edata, expectedNumberOfShifts=1, threshold=5)
BAMMtools:::plot.bammdata(best, lwd=1, breaksmethod='quantile', pal="temperature", legend=T)
addBAMMshifts(best, cex=1)
dev.off()

# the 6 most likely scenario's within the 95% credible scenarios
# F value is the relative probability, F values from all scenarios add up to F=1
pdf("BAMM_css.pdf", width=7, height=10)
css <- credibleShiftSet(edata, expectedNumberOfShifts = 1, threshold=5, set.limit=0.95)
BAMMtools:::plot.credibleshiftset(css, plotmax = 6, pal = "temperature", legend=T)
dev.off()

# shift probabilities converted to branch lengths
pdf("BAMM_margprobs.pdf", width=7, height=10)
marg_probs <- marginalShiftProbsTree(edata)
plot.phylo(marg_probs, show.tip.label = F)
dev.off()

# net diversification graph
pdf("BAMM_netdiv.pdf", width=7, height=5)
starttime <- max(branching.times(tree))
plotRateThroughTime(edata, intervalCol="darkgreen", avgCol="darkgreen", start.time=starttime, ylim=c(0,25))
dev.off()

q()

Running BAMM

Place the input files in a folder and copy this folder to the instance. SSH into the instance, navigate to the folder, start a Screen session (if desired) and run BAMM with:

Rscript BAMMsetpriors.R > STDOUT.BAMMpriors.log && cat BAMM_configuration.txt myPriors.txt > BAMM_configurationpriors.txt && bamm -c BAMM_configurationpriors.txt > STDOUT.BAMM.log && Rscript BAMMplots.R > STDOUT.BAMMplots.log

Results

The result will be several PDF files with plots, for details on how to interpret this data or set the scripts to work optimally for your data, please refer to the BAMM website!

⚠️ **GitHub.com Fallback** ⚠️