Control file MSci - Pas-Kapli/bpp-tutorial GitHub Wiki
The control file for the parameter estimation under the MSC or MSCi (A00) is similar to the previous exercise. It differs in three main points: Analysis mode, species&tree, and for the MSCi model the addition of the phiprior.
1. Analysis mode
For this analysis we want both the species tree and the assignment of samples to species to be fixed.
To achieve that we use the following two lines in the control file:
speciesdelimitation = 0
speciestree = 0
2. species&tree
For this section, we would need to provide the species relationships. Under an introgression hypothesis, the species relationships are not described by a purely binary tree since horizontal relationships among lineages are also allowed. The representation of these relationships in newick format can easily get complicated.
We can create the newick tree using bpp itself under the --msci-create
flag and a description of the relationships in a text file.
In the yeast example, we would do that as follows:
tree ((((Scer,Spar),Smik),Skud),Sbay);
# define labels on the ancestral nodes of the tree.
# We define each internal node by providing two tips for which this node is their MRCA.
define A as Scer,Spar
define B as Scer,Smik
define C as Scer,Skud
define R as Scer,Sbay
# Define a hybridization event.
# Here we define that there is a horizontal relationship from the branch "Skud C" to the "Sbay R".
# We name H the node at the receiving lineage(Sbay R) and D the node at the source lineage (Skud C).
# Next, by defining "tau=yes,no" which means that H and D have the same tau, that of H.
hybridization Sbay R, Skud C as H D tau=yes,no phi=0.4
A txt version of the file above is here
wget https://raw.githubusercontent.com/Pas-Kapli/bpp-tutorial/master/A00_Yeast/data/msci_net.txt
To create the network in newick under this description we execute:
bpp --msci-create msci_net.txt
The extended newick representation of the network is:
((((Scer,Spar)A,Smik)B,(H[&phi=0.600000,tau-parent=no],Skud)D)C, (Sbay)H[&phi=0.400000,tau-parent=yes])R;
The annotations in the square brackets [...] are used to define the characteristics of the hybridization events.
Each annotation is relative to the corresponding parent of the node defined on. Here, the first annotation
(reading from left to right) is defined on inner node H and relates to the parent node D.
The attribute tau-parent=no
dictates that parent node D will not have its own tau parameter, but instead, share the same tau
with node H.
Here, is the illustration:
For further information about the extended newick notation check here and here.
We copy the newick tree in the species&tree
parameter and fill in the rest of the information:
species&tree = 5 Scer Spar Smik Skud Sbay
1 1 1 1 1
((((Scer,Spar)A,Smik)B,(H[&phi=0.600000,tau-parent=no],Skud)D)C, (Sbay)H[&phi=0.400000,tau-parent=yes])R;
3. phiprior
The phi parameter specifies a beta prior beta(a, b) for phi, i.e., the introgression probability under the MSci model.
Since we don't have other information regarding the phi parameter for the yeast species we will use a beta distribution that assigns an equal probability to all values from 0 to 1.
phiprior = 1 1
Final control file
Assembling all the information in the control file (check previous pages for the "Input/Output" and "MCMC sampling parameters") here would look like this:
Take a note at the definition of the theta
parameter. The purpose of the analysis is to calculate the MSci parameters, among which is the theta. To achieve that we add the "e" after the alpha and beta of the prior, i.e., "thetaprior = 3 0.04 e"
seed = -1
seqfile = ../data/bpp_seqfile.txt
Imapfile = ../data/Imap.txt
outfile = out.txt
mcmcfile = mcmc.txt
speciesdelimitation = 0 * fixed species delimitation
speciestree = 0 * fixed species tree
species&tree = 5 Scer Spar Smik Skud Sbay
1 1 1 1 1
((((Scer,Spar)A,Smik)B,(H[&phi=0.600000,tau-parent=no],Skud)D)C, (Sbay)H[&phi=0.400000,tau-parent=yes])R;
usedata = 1 * 0: no data (prior); 1:seq like
nloci = 106 * number of data sets in seqfile
cleandata = 0 * remove sites with ambiguity data (1:yes, 0:no)
model = JC69
thetaprior = 3 0.04 e # invgamma(a, b) for theta. With "e" theta parameters are estimated
tauprior = 3 0.2 # invgamma(a, b) for root tau & Dirichlet(a) for other tau's
phiprior = 1 1 # beta(a, b) for phi in the MSci model
finetune = 1
print = 1 0 0 0 * MCMC samples, locusrate, heredityscalars Genetrees
burnin = 8000
sampfreq = 2
nsample = 50000
Download the control file from this link or with wget:
wget https://raw.githubusercontent.com/Pas-Kapli/bpp-tutorial/master/A00_Yeast/data/A00.bpp.ctl