Parameter estimation under the MSC‐I (Baobab dataset, model D) - bpp/bpp-tutorial-geneflow GitHub Wiki

This demo we will infer the parameters for the MSC-I model with the following topology using the Baobabs dataset.

We will use the following folder structure:

.
|-- baobab
|   |-- A00
|   |   |-- baobab.A00.ctl
|   |   `-- baobab.A00.msci.ctl
|   |-- A01
|   |   `-- baobab.A01.ctl
|   |-- baobab.map.txt
|   `-- baobab.phy
`-- bpp
    `-- A00
        |-- r1
        |   `-- baobab.A00.msci.ctl
        `-- r2
            `-- baobab.A00.msci.ctl

8 directories, 7 files

Below are the steps to create the folder structure, download the data files, and copy control files to the folders where we will run BPP:

# create folder structure
mkdir -p bpp/A00/r{1,2}

# download and extract data files
wget https://github.com/bpp/bpp-tutorial-geneflow/raw/main/data/baobab.tar.gz
tar zxvf baobab.tar.gz
rm baobab.tar.gz

# copy control files to the right folders
cp baobab/A00/baobab.A00.msci.ctl bpp/A00/r1
cp baobab/A00/baobab.A00.msci.ctl bpp/A00/r2

You can check if your folder structure resembles the one described above using the commands:

tree --charset=ascii

Constructing a newick tree for the MSC-I model

Create a file called baobab-msci-defs.txt and use the following specification:

tree (((((Amad, Arub)e, Agra)d, Adig)c, Agre)b, Smic)a;
bidirection Adig c, Agre b as x y phi=0.2,0.2

This can be done by typing

cd bpp/A00
echo 'tree (((((Amad, Arub)e, Agra)d, Adig)c, Agre)b, Smic)a;' > baobab-msci-defs.txt
echo 'bidirection Adig c, Agre b as x y phi=0.2,0.2' >> baobab-msci-defs.txt 

and then run bpp with --msci-create:

bpp --msci-create baobab-msci-defs.txt

You should obtain the following newick tree:

(((((Amad,Arub)e,Agra)d,(Adig,y[&phi=0.200000])x)c,(Agre,x[&phi=0.200000])y)b, Smic)a;

For more information on the --msci-create option, see the documentation.

To perform posterior inference, create a control file called baobab.A00.msci.ctl in bpp/A00/ with the following content:

          seed =  -1

       seqfile = ../../../baobab/baobab.phy
      Imapfile = ../../../baobab/baobab.map.txt
       jobname = baobab-msci

 speciesdelimitation = 0
         speciestree = 0

#  speciesmodelprior = 1  # 0: uniform LH; 1:uniform rooted trees; 2: uniformSLH; 3: uniformSRooted

  species&tree = 6  Adig Agra Agre Amad Arub Smic
                    3    2    1    2    2    1
                    (((((Amad,Arub)e,Agra)d,(Adig,y[&phi=0.200000])x)c,(Agre,x[&phi=0.200000])y)b, Smic)a;
         phase = 0 0 0 0 0 0
                  
       usedata = 1    # 0: no data (prior); 1:seq like
         nloci = 100  # number of data sets in seqfile

     cleandata = 0    # remove sites with ambiguity data (1:yes, 0:no)?

    thetaprior = gamma 2 400  # gamma(a, b) for theta
      tauprior = gamma 3 200  # gamma(a, b) for root tau & Dirichlet(a) for other tau's
      phiprior = 1 1          # beta(a,b)

     locusrate = 0
         clock = 1 

      finetune = 1
         print = 1 0 0 0   # MCMC samples, locusrate, heredityscalars, Genetrees
        burnin = 50000
      sampfreq = 10
       nsample = 10000

Then copy this to the two run directories:

cp baobab.A00.msci.ctl r1
cp baobab.A00.msci.ctl r2

Prior distributions

Note that we have the new keyword phiprior in the control file, which is used to assign a Beta prior on the parameter $\varphi$ (introgression probability).

The syntax is

phiprior = alpha beta

where alpha and beta describe the Beta distribution $Beta(\alpha,\beta)$, with mean $\alpha / (\alpha + \beta)$. Below are some examples of a Beta distribution, and an Inverse Gamma distribution which we used to assign a prior on taus and thetas.

We typically use betaprior = 1 1 which is equivalent to the uniform distribution.

Running BPP

Check that the BPP control files in bpp/A00/r1 and bpp/A00/r2 point to the right PHYLIP (sequence alignment) and map files. They should have the following form:

 seqfile = ../../../baobab/baobab.phy
Imapfile = ../../../baobab/baobab.map.txt

You can check by:

cat r1/baobab.A00.msci.ctl
cat r2/baobab.A00.msci.ctl

Finally, run bpp:

cd r1
bpp --cfile baobab.A00.msci.ctl

If you run into problems with core pinning, or BPP exits with the error: "Error while pinning thread to core." then try running BPP with the --no-pin option:

bpp --no-pin --cfile baobab.A00.msci.ctl

We should run the program at least twice and compare the outputs from the two runs to check for convergence.

To do another run:

cd ../r2
bpp --cfile baobab.A00.msci.ctl

Synchronization point: Make sure everybody reached this stage


See a sample output file baobab-msci.txt here.

Screen output

The first part of the output is similar to the output of the A00 run for the MSC model. In addition, it includes a table detailing the node relationships within the species tree, as well as information on nodes participating in hybridization or introgression events. See a sample output file for the MSC-I analysis.

.
.
Starting timer..
Seed: -794801753 (randomly generated)
Parsing species tree... Done
Species tree contains hybridization/introgression events.

Hybridization events: 0
Bidirectional introgressions: 1
Label        Node  Child1  Child2  Parent
Adig            0     N/A     N/A      11
Agra            1     N/A     N/A       9
Agre            2     N/A     N/A      12
Amad            3     N/A     N/A      10
Arub            4     N/A     N/A      10
Smic            5     N/A     N/A       6
a               6       7       5     N/A
b               7       8      12       6
c               8       9      11       7
d               9      10       1       8
e              10       3       4       9
x              11       0      14       8   [tau = 1, phi = 0.800000, prop_tau = 1, has_phi = 0]
y              12       2      13       7   [tau = 1, phi = 0.800000, prop_tau = 0, has_phi = 0]
x              13     N/A     N/A      12   Mirrored bidirection node [Hybrid = x (11)]   [tau = 0, phi = 0.200000, prop_tau = 0, has_phi = 1]  phi_x : y -> x
y              14     N/A     N/A      11   Mirrored bidirection node [Hybrid = y (12)]   [tau = 0, phi = 0.200000, prop_tau = 0, has_phi = 1]  phi_y : x -> y
.
.

The last four lines indicate that nodes x and y are "hybridization" nodes, each having two parent nodes (hence listed twice). For example, the parents of x have indices nodes 8 and 12, which correspond to nodes c and y, respectively. The last two lines also contain the definition of the two introgression probabilities: phi_x : y -> x ($\varphi_x$) and phi_y : x -> y ($\varphi_y$).

Next, BPP summarises the input data and the model:

 Locus | Model | Sequences | Length | Ambiguous sites | Compressed | Base freqs 
-------+-------+-----------+--------+-----------------+------------+------------
     1 |  JC69 |        10 |    960 |              12 |         19 |      Fixed 
     2 |  JC69 |        10 |    933 |              72 |         18 |      Fixed 
.
.

Parsing map file...
Adi001 Adig
Adi002 Adig
Aga001 Agra
Aga002 Agra
Age001 Agre
Ama006 Amad
Ama018 Amad
Aru001 Arub
Aru127 Arub
Smi165 Smic
 Done

Per-locus sequences in data and 'species&tree' tag:
C.File | Data |                Status                | Population
-------+------+--------------------------------------+-----------
     3 |    2 | [OK]                                 | Adig      
     2 |    2 | [OK]                                 | Agra      
     1 |    1 | [OK]                                 | Agre      
     2 |    2 | [OK]                                 | Amad      
     2 |    2 | [OK]                                 | Arub      
     1 |    1 | [OK]                                 | Smic      


Map of populations and ancestors (1 in map indicates ancestor):
   Species   1  2  3  4  5  6  7  8  9  10  11  12  13  14  15
 1 Adig      1  0  0  0  0  0  1  1  1  0   0   1   1   1   0
 2 Agra      0  1  0  0  0  0  1  1  1  1   0   0   0   0   0
 3 Agre      0  0  1  0  0  0  1  1  1  0   0   1   1   0   1
 4 Amad      0  0  0  1  0  0  1  1  1  1   1   0   0   0   0
 5 Arub      0  0  0  0  1  0  1  1  1  1   1   0   0   0   0
 6 Smic      0  0  0  0  0  1  1  0  0  0   0   0   0   0   0
 7 a         0  0  0  0  0  0  1  0  0  0   0   0   0   0   0
 8 b         0  0  0  0  0  0  1  1  0  0   0   0   0   0   0
 9 c         0  0  0  0  0  0  1  1  1  0   0   0   0   0   0
10 d         0  0  0  0  0  0  1  1  1  1   0   0   0   0   0
11 e         0  0  0  0  0  0  1  1  1  1   1   0   0   0   0
12 x         0  0  0  0  0  0  1  1  1  0   0   1   0   0   0
13 y         0  0  0  0  0  0  1  1  0  0   0   0   1   0   0
14 x         0  0  0  0  0  0  1  1  0  0   0   0   1   1   0
15 y         0  0  0  0  0  0  1  1  1  0   0   1   0   0   1

Generating gene trees.... Done

Initial MSC density and log-likelihood of observing data:
log-PG0 = 4253.643308   log-L0 = -474669.864998

[EXPERIMENTAL] - New extended IM rubberband algorithm
Theta proposal: Mixed: Sliding window (0.20) + Inv-G approx Gibbs (0.80))
Linked thetas: none
0:00 taken to read and process data..
Restarting timer...

The next part provides a description of the values printed on screen during the MCMC run. BPP prints many values on screen which we can cluster into three categories:

  • The first column is a progress indicator with the percentage of the the completed MCMC run. Negative progress means we are still in the burn-in.
  • The next block of columns shows the acceptance proportions for each activated MCMC move. In our case we have seven active proposals.
  • The remaining columns indicate information about the chain at that point in the run. Those are running means for three thetas and three taus, the phi parameter, the sum of gene tree log-densities, and the mean log-likelihood.

Since finetune was set to 1, BPP optimizes step-length values during burn-in, such that acceptance proportions (Pjump) are around 30%. To achieve this, BPP typically does four rounds of optimization during the burn-in and prints on screen the Current Pjump (acceptance proportions for each enabled MCMC move at that step), Current finetune (step-length/tuning parameters at that step) and New finetune (the new optimized tuning parameter values).

-*- Terms index -*-

  Prgs: progress of MCMC run (negative progress means burnin)
  Gage: gene-tree age proposal
  Gspr: gene-tree SPR proposal
   th1: species tree theta proposal (tips)   (sliding window)
   th2: species tree theta proposal (inner)  (sliding window)
   thg: species tree theta proposal (inner)  (gibbs sampler)
   tau: species tree tau proposal
   mix: mixing proposal
  phis: MSCi phi parameter proposal (sliding window)
  phig: MSCi phi parameter proposal (gibbs sampler)
theta1: mean theta of node 1
theta2: mean theta of node 2
theta3: mean theta of node 4
  tau1: mean tau of node 6
  tau2: mean tau of node 7
  tau3: mean tau of node 8
  phi1: mean of phi_x : y -> x
  phi2: mean of phi_y : x -> y
log-PG: log-probability of gene trees (MSCi)
 log-L: mean log-L of observing data

     |            Acceptance proportions            |
Prgs | Gage Gspr  th1  th2  thg  tau  mix phis phig | theta1 theta2 theta3    tau1   tau2   tau3    phi1   phi2       log-PG          log-L
-------------------------------------------------------------------------------------------------------------------------------------------
 -45%  0.32 0.41 0.67 0.61 0.94 0.08 0.01 0.99 1.00   0.0059 0.0053 0.0044  0.0134 0.0131 0.0049  0.0966 0.7366   3544.58397  -451106.27456  0:28
 -40%  0.32 0.41 0.68 0.62 0.94 0.08 0.00 0.99 1.00   0.0059 0.0053 0.0045  0.0134 0.0132 0.0049  0.1003 0.7385   3556.31541  -451094.28833  0:57
 -37%  0.32 0.41 0.68 0.63 0.94 0.08 0.00 0.99 1.00   0.0059 0.0053 0.0044  0.0132 0.0129 0.0049  0.1216 0.7313   3589.05710  -451089.16198

                     Gage    Gspr     th1     th2     tau     mix    phis  
Current Pjump:     0.32121 0.41035 0.68302 0.62502 0.08164 0.00472 0.98733
Current finetune:  5.00000 0.00100 0.00050 0.00200 0.00100 0.30000 0.00100
New finetune:      5.41911 0.00148 0.00181 0.00587 0.00025 0.00437 0.09858

=> 'finetune = 1 Gage:5.419111 Gspr:0.001475 th1:0.001805 th2:0.005875 tau:0.000253 mix:0.004365 phis:0.098578'

 -35%  0.32 0.33 0.22 0.29 0.95 0.34 0.64 0.30 1.00   0.0058 0.0053 0.0044  0.0126 0.0117 0.0048  0.2044 0.7045   3550.73115  -451072.03009  1:26
 -30%  0.32 0.33 0.23 0.31 0.95 0.34 0.63 0.31 1.00   0.0058 0.0052 0.0044  0.0124 0.0115 0.0048  0.2127 0.6949   3572.09331  -451069.88734  1:56
 -25%  0.32 0.34 0.24 0.32 0.95 0.32 0.61 0.30 1.00   0.0059 0.0053 0.0044  0.0130 0.0124 0.0048  0.1662 0.7154   3542.00140  -451074.89713  2:26


                     Gage    Gspr     th1     th2     tau     mix    phis  
Current Pjump:     0.32059 0.33603 0.23604 0.31958 0.31829 0.61232 0.30035
Current finetune:  5.41911 0.00148 0.00181 0.00587 0.00025 0.00437 0.09858
New finetune:      5.85973 0.00169 0.00138 0.00633 0.00027 0.01229 0.09871

=> 'finetune = 1 Gage:5.859731 Gspr:0.001688 th1:0.001377 th2:0.006329 tau:0.000271 mix:0.012285 phis:0.098712'

 -20%  0.32 0.31 0.34 0.41 0.94 0.29 0.37 0.28 1.00   0.0059 0.0052 0.0044  0.0133 0.0131 0.0049  0.1382 0.7374   3565.97303  -451079.93841  2:58
 -15%  0.32 0.31 0.35 0.40 0.94 0.28 0.36 0.28 1.00   0.0059 0.0052 0.0044  0.0134 0.0132 0.0049  0.1275 0.7387   3562.40911  -451079.19182  3:29
 -12%  0.32 0.31 0.35 0.40 0.94 0.28 0.36 0.28 1.00   0.0059 0.0053 0.0044  0.0134 0.0132 0.0049  0.1248 0.7388   3563.90697  -451078.88961

                     Gage    Gspr     th1     th2     tau     mix    phis  
Current Pjump:     0.32201 0.31235 0.35026 0.40306 0.28029 0.36168 0.27962
Current finetune:  5.85973 0.00169 0.00138 0.00633 0.00027 0.01229 0.09871
New finetune:      6.36968 0.00177 0.00166 0.00912 0.00025 0.01539 0.09102

=> 'finetune = 1 Gage:6.369682 Gspr:0.001770 th1:0.001658 th2:0.009116 tau:0.000251 mix:0.015391 phis:0.091021'

 -10%  0.32 0.30 0.22 0.25 0.94 0.29 0.30 0.32 1.00   0.0059 0.0053 0.0044  0.0135 0.0133 0.0049  0.1307 0.7398   3546.67042  -451078.93730  4:00
  -5%  0.32 0.30 0.24 0.25 0.95 0.32 0.30 0.33 1.00   0.0059 0.0053 0.0044  0.0129 0.0123 0.0049  0.1748 0.7178   3564.90472  -451074.85603  4:31
   0%  0.32 0.30 0.24 0.25 0.95 0.32 0.30 0.33 1.00   0.0058 0.0053 0.0044  0.0128 0.0124 0.0048  0.1743 0.7211   3569.36430  -451074.64288  5:00


                     Gage    Gspr     th1     th2     tau     mix    phis  
Current Pjump:     0.32087 0.30239 0.24451 0.25478 0.32167 0.30064 0.32558
Current finetune:  6.36968 0.00177 0.00166 0.00912 0.00025 0.01539 0.09102
New finetune:      6.89478 0.00179 0.00132 0.00757 0.00027 0.01543 0.10026

=> 'finetune = 1 Gage:6.894777 Gspr:0.001786 th1:0.001315 th2:0.007569 tau:0.000272 mix:0.015429 phis:0.100258'

   5%  0.32 0.30 0.33 0.34 0.95 0.30 0.30 0.28 1.00   0.0059 0.0053 0.0045  0.0125 0.0122 0.0049  0.1758 0.7203   3533.79472  -451071.55092  5:28
  10%  0.32 0.30 0.34 0.34 0.95 0.30 0.30 0.28 1.00   0.0059 0.0053 0.0045  0.0128 0.0124 0.0049  0.1636 0.7210   3499.55395  -451073.70877  5:57
  15%  0.32 0.30 0.34 0.34 0.95 0.30 0.30 0.28 1.00   0.0058 0.0053 0.0045  0.0126 0.0122 0.0048  0.1770 0.7150   3555.84297  -451071.94304  6:26
  20%  0.32 0.30 0.35 0.34 0.95 0.31 0.30 0.29 1.00   0.0058 0.0052 0.0044  0.0126 0.0120 0.0048  0.1876 0.7104   3570.93935  -451070.94204  6:54
  25%  0.32 0.30 0.35 0.34 0.95 0.31 0.30 0.29 1.00   0.0058 0.0052 0.0044  0.0126 0.0120 0.0048  0.1900 0.7098   3520.54856  -451070.92273  7:23
  30%  0.32 0.30 0.35 0.34 0.95 0.31 0.30 0.29 1.00   0.0058 0.0052 0.0044  0.0127 0.0122 0.0048  0.1792 0.7135   3540.97213  -451072.34089  7:51
  35%  0.32 0.30 0.35 0.34 0.95 0.30 0.29 0.29 1.00   0.0058 0.0052 0.0044  0.0129 0.0124 0.0048  0.1680 0.7177   3588.96851  -451073.57006  8:19
  40%  0.32 0.30 0.35 0.34 0.95 0.30 0.29 0.29 1.00   0.0058 0.0052 0.0044  0.0129 0.0125 0.0048  0.1646 0.7198   3529.21983  -451074.14537  8:48
  45%  0.32 0.30 0.36 0.34 0.95 0.30 0.29 0.29 1.00   0.0058 0.0052 0.0044  0.0130 0.0125 0.0048  0.1623 0.7217   3562.01332  -451074.62358  9:17
  50%  0.32 0.30 0.36 0.34 0.95 0.30 0.29 0.29 1.00   0.0058 0.0052 0.0044  0.0130 0.0126 0.0048  0.1577 0.7231   3572.54096  -451074.85311  9:46
  55%  0.32 0.30 0.36 0.34 0.95 0.30 0.29 0.29 1.00   0.0058 0.0052 0.0044  0.0130 0.0126 0.0048  0.1586 0.7232   3574.79113  -451074.82793  10:17
  60%  0.32 0.30 0.36 0.34 0.95 0.29 0.29 0.29 1.00   0.0059 0.0052 0.0044  0.0130 0.0127 0.0048  0.1557 0.7244   3545.94588  -451075.25304  10:47
  65%  0.32 0.30 0.36 0.34 0.95 0.29 0.29 0.29 1.00   0.0059 0.0052 0.0044  0.0130 0.0126 0.0048  0.1553 0.7240   3514.09940  -451075.13946  11:17
  70%  0.32 0.30 0.36 0.34 0.95 0.30 0.29 0.29 1.00   0.0058 0.0052 0.0044  0.0130 0.0126 0.0048  0.1576 0.7234   3577.00182  -451075.02462  11:46
  75%  0.32 0.30 0.36 0.34 0.95 0.29 0.29 0.29 1.00   0.0059 0.0052 0.0044  0.0131 0.0127 0.0048  0.1561 0.7251   3567.12449  -451075.71562  12:15
  80%  0.32 0.30 0.36 0.34 0.95 0.29 0.29 0.29 1.00   0.0059 0.0052 0.0044  0.0131 0.0127 0.0048  0.1554 0.7250   3571.62652  -451075.78153  12:44
  85%  0.32 0.30 0.36 0.34 0.95 0.30 0.29 0.29 1.00   0.0059 0.0052 0.0044  0.0131 0.0127 0.0048  0.1574 0.7243   3558.26288  -451075.57401  13:12
  90%  0.32 0.30 0.36 0.34 0.95 0.30 0.29 0.29 1.00   0.0059 0.0052 0.0044  0.0130 0.0126 0.0048  0.1576 0.7239   3524.88323  -451075.56414  13:40
  95%  0.32 0.30 0.36 0.34 0.95 0.30 0.29 0.29 1.00   0.0059 0.0052 0.0044  0.0130 0.0126 0.0048  0.1580 0.7239   3582.15262  -451075.61879  14:08
 100%  0.32 0.30 0.36 0.34 0.95 0.30 0.29 0.29 1.00   0.0059 0.0052 0.0044  0.0131 0.0127 0.0048  0.1567 0.7247   3557.13007  -451075.81724  14:36

14:36 spent in MCMC

After MCMC sampling is complete, the remaining part provides posterior summaries:

Node-Index  Node-Type  Node-Label
---------------------------------
1           Tip        Adig
2           Tip        Agra
3           Tip        Agre
4           Tip        Amad
5           Tip        Arub
6           Tip        Smic
7           Root       a
8           Inner      b
9           Inner      c
10          Inner      d
11          Inner      e
12          Hybrid     x (parental node index 9)
13          Hybrid     y (parental node index 8)
14          Hybrid     x (parental node index 13)
15          Hybrid     y (parental node index 12)

Resolving potential unidentifiability for BDI x <-> y
lnL0: -10.508458
Round  0,  0 points moved...
  lnL = -10.508458

   0 points reflected
Printing processed sample into baobab-msci.mcmc.txt.processed


  param       mean     median      S.D       min       max       2.5%     97.5%    2.5%HPD   97.5%HPD      ESS*        Eff*      rho1  
---------------------------------------------------------------------------------------------------------------------------------------
   theta:1  0.005849  0.005800   0.000688  0.003578  0.009940  0.004641  0.007330  0.004569  0.007224   9986.208979  0.998621  0.014684
   theta:2  0.005262  0.005215   0.000649  0.003183  0.008292  0.004127  0.006665  0.004067  0.006568  10237.355673  1.023736  0.005371
   theta:4  0.004445  0.004406   0.000539  0.002895  0.007007  0.003496  0.005620  0.003415  0.005499  10082.301232  1.008230  -0.001069
   theta:5  0.006219  0.006150   0.000803  0.003896  0.010532  0.004825  0.007965  0.004685  0.007780  10098.420255  1.009842  0.009405
   theta:7  0.033757  0.033638   0.002843  0.024151  0.047001  0.028521  0.039647  0.028136  0.039215    131.103801  0.013110  0.258467
   theta:8  0.009387  0.008101   0.005724  0.000156  0.039408  0.001808  0.023530  0.000921  0.021010    248.338498  0.024834  0.425168
   theta:9  0.014915  0.014825   0.001591  0.009557  0.022386  0.012089  0.018263  0.012059  0.018214    110.487477  0.011049  0.543591
  theta:10  0.016262  0.016032   0.003535  0.005511  0.040398  0.009931  0.023750  0.009703  0.023410   3046.377211  0.304638  0.217857
  theta:11  0.007273  0.006540   0.004001  0.000257  0.029966  0.001575  0.016789  0.000632  0.014871   6941.576652  0.694158  0.105135
  theta:12  0.005666  0.004968   0.003340  0.000181  0.026425  0.001125  0.013845  0.000652  0.012241   8162.401110  0.816240  0.054259
  theta:13  0.017956  0.017792   0.006189  0.000150  0.048453  0.005088  0.030793  0.005086  0.030781    294.195517  0.029420  0.357363
     tau:7  0.013076  0.013069   0.000846  0.010836  0.015727  0.011621  0.014661  0.011559  0.014593     34.775156  0.003478  0.956230
     tau:8  0.012689  0.012720   0.001054  0.009072  0.015581  0.010683  0.014504  0.010775  0.014572     29.469507  0.002947  0.968117
     tau:9  0.004843  0.004844   0.000161  0.004238  0.005421  0.004523  0.005158  0.004524  0.005158   1031.370566  0.103137  0.716672
    tau:10  0.003627  0.003625   0.000190  0.002908  0.004330  0.003261  0.004007  0.003243  0.003986   2343.988014  0.234399  0.503285
    tau:11  0.003360  0.003369   0.000198  0.002581  0.003992  0.002949  0.003728  0.002964  0.003740   2666.929425  0.266693  0.518095
    tau:12  0.004605  0.004619   0.000216  0.003559  0.005317  0.004139  0.004986  0.004138  0.004984   1687.610350  0.168761  0.503691
    tau:13  0.004605  0.004619   0.000216  0.003559  0.005317  0.004139  0.004986  0.004138  0.004984   1687.610350  0.168761  0.503691
phi:14<-13  0.156107  0.154812   0.072394  0.000037  0.428450  0.014930  0.297696  0.000037  0.276364     65.489330  0.006549  0.700277
phi:15<-12  0.724406  0.727535   0.057785  0.463846  0.924433  0.604892  0.830219  0.613397  0.835491    252.719212  0.025272  0.234032

       lnL  -451075.675359  -451075.327500  24.651807  -451176.021000  -450985.951000  -451124.549000  -451028.660000  -451123.709000  -451028.027000    539.392134  0.053939  0.190325


Species tree network:
(((((Amad,Arub)e,Agra)d,(Adig,y[&phi=0.724406])x)c,(Agre,x[&phi=0.156107])y)b, Smic)a;

Species tree network with attributes (thetas and tau 95% HPD), and branch lengths (tau difference):
(((((Amad: 0.003360, Arub: 0.003360)e[&height_95%_HPD={0.00296400, 0.00374000}, theta=0.0072728]: 0.000267, Agra: 0.003627)d[&height_95%_HPD={0.00324300, 0.00398600}, theta=0.0162619]: 0.001216, (Adig: 0.004605, y)x[&height_95%_HPD={0.00413800, 0.00498400}, theta=0.0056665]: 0.000238)c[&height_95%_HPD={0.00452400, 0.00515800}, theta=0.0149149]: 0.007847, (Agre: 0.004605, x)y[&height_95%_HPD={0.00413800, 0.00498400}, theta=0.0179562]: 0.008084)b[&height_95%_HPD={0.01077500, 0.01457200}, theta=0.0093870]: 0.000387, Smic: 0.013076)a[&height_95%_HPD={0.01155900, 0.01459300}, theta=0.0337574];

Species tree network with taus and thetas:
(((((Amad #0.004445, Arub #0.006219)e:0.003360 #0.007273, Agra #0.005262)d:0.003627 #0.016262, (Adig #0.005849, y)x[&phi=0.843893]:0.004605 #0.005666)c:0.004843 #0.014915, (Agre , x)y[&phi=0.275594]:0.004605 #0.017956)b:0.012689 #0.009387, Smic )a:0.013076 #0.033757;

FigTree tree is in baobab-msci.FakeTree.tre

Summarizing parameter estimates using file baobab-msci.conditional_a1b1.txt ...

  param       mean      S.D       2.5%     97.5%    2.5%HPD   97.5%HPD    Effu      Effy       c    
----------------------------------------------------------------------------------------------------
   theta:1  0.005837  0.000665  0.004676  0.007270  0.004559  0.007132  1.495599  0.646056  8.279819
   theta:2  0.005243  0.000623  0.004154  0.006586  0.004059  0.006466  4.675128  0.930942  7.157320
   theta:4  0.004434  0.000520  0.003526  0.005555  0.003431  0.005442  4.564332  0.933249  6.776672
   theta:5  0.006211  0.000773  0.004871  0.007891  0.004760  0.007742  5.790023  0.976600  6.722669
   theta:7  0.033760  0.002817  0.028521  0.039517  0.028183  0.039156  0.013061  0.012950  2.933831
   theta:8  0.009071  0.005620  0.001846  0.023237  0.000773  0.020682  0.021123  0.021046  1.210392
   theta:9  0.014908  0.001567  0.012056  0.018224  0.011818  0.017973  0.009805  0.009779  1.389252
  theta:10  0.016253  0.003510  0.009978  0.023764  0.009529  0.023230  0.350036  0.317527  1.413411
  theta:11  0.007117  0.003867  0.001592  0.016295  0.000824  0.014646  0.722229  0.633418  1.240902
  theta:12  0.005370  0.003086  0.001125  0.012869  0.000469  0.011292  1.015977  0.781320  1.419672
  theta:13  0.017865  0.005923  0.004883  0.030107  0.005298  0.030481  0.028692  0.028314  1.870696
phi:14<-13  0.156806  0.072199  0.014482  0.297885  0.000223  0.276942  0.006234  0.006227  1.251353
phi:15<-12  0.724741  0.057088  0.606228  0.829097  0.608968  0.831381  0.020109  0.019873  2.440936

Once the run is finished, there will be five output files:

  1. baobab-msci.txt: main output file containing the above information
  2. baobab-msci.mcmc.txt: MCMC sample of model parameters
  3. baobab-msci.conditional_a1b1.txt: MCMC sample of $a$ and $b$ parameters of the gamma distributions in Gibbs sampling, used for generating final posterior summaries.
  4. baobab-msci.FakeTree.tre: fitted model, which can be visualised in FigTree; see figure below
  5. baobab-msci.SeedUsed: a random seed used by BPP

From the posterior estimates, we have detected some gene flow between Adig and Agre in both directions, with $\varphi_{y \rightarrow x} = 0.157\ (0.000,\ 0.277)$ from Agre to Adig (phi:14<-13) and $\varphi_{x \rightarrow y} = 0.725\ (0.609,\ 0.831)$ in the opposite direction at time $\tau_x = \tau_y = 0.00460\ (0.00414,\ 0.00498)$ (tau:12 and tau:13). Numbers in the parentheses are the 95% HPD interval.

Convergence check

Try completing two runs and check the convergence of MCMC. A helpful tool is Tracer, but BPP provides some helpful summary statistics too. Here, we provide a python script (scatter-params.py) to help plot posteriors for theta and tau parameters. If we are satisfied, we can use one MCMC analysis from BPP or combine them to report parameter estimates. The plotting script can be executed from the command line with

# download script
wget https://raw.githubusercontent.com/bpp/bpp-tutorial-geneflow/main/third-day/scripts/scatter-params.py

# add execute permissions
chmod +x scatter-params.py

Then run script on your two mcmc files:

./scatter-params.py <mcmc1.txt> <mcmc2.txt>

The script will produce three plots, one for thetas (thetas.pdf), one for taus (taus.pdf) and one for phis (phis.pdf).

Additional option: Theta models

It is possible to constrain theta parameters in the model using the keyword thetamodel. For example, we can force theta to remain unchanged after introgression. See section 5.3 in the BPP manual

Testing for gene flow using Bayes factors

The method is described here.

We will use a custom script savage-dickey.py to parse the MCMC sample in baobab-msci.mcmc.txt.

First, get the script:

wget https://raw.githubusercontent.com/bpp/bpp-tutorial-geneflow/main/fourth-day/scripts/savage-dickey.py
chmod +x savage-dickey.py

The script requires pandas and scipy.

Usage:

./savage-dickey.py
 usage: savage-dickey.py cutoff column beta_prior_a beta_prior_b mcmcfile
 example: savage-dickey.py 0.01 "phi_x<-w" 1 1 mcmc.model2.txt

Let's test whether each of the estimated introgression probabilities is significantly greater than zero. To simplify the notation, let $\varphi_y$ denote $\varphi_{y \rightarrow x}$ for $\text{Agre} \rightarrow \text{Adig}$ introgression. Consider testing $H_0: \varphi_y = 0$ against $H_1: \varphi_y &gt; 0$. We can use the posterior MCMC sample of $\varphi_y$ to approximate the Bayes factor

$$ B_{10,\varepsilon} = \frac{\mathbf{P}(\phi)}{\mathbf{P}(\phi | X)} $$

where $\mathbf{P}(\phi)$ is the probability of the null region ${\phi : \varphi_y &lt; \varepsilon}$ under the prior distribution while $\mathbf{P}(\phi | X)$ is an analogous probability under the posterior distribution.

First, try $\varepsilon = 0.01$:

./savage-dickey.py 0.01 "phi:14<-13:x<-y" 1 1 r1/baobab-msci.mcmc.txt
Cutoff: 0.01
Prior: Beta(1.0,1.0)
Parameter: phi:14<-13:x<-y
MCMC file: r1/baobab-msci.mcmc.txt

Prior: 0.01
Posterior: 0.0163
Bayes factor: 0.6134969325153375

If we use a smaller threshold for the null set, $\varepsilon = 0.001$, we get:

./savage-dickey.py 0.001 "phi:14<-13:x<-y" 1 1 r1/baobab-msci.mcmc.txt
Cutoff: 0.001
Prior: Beta(1.0,1.0)
Parameter: phi:14<-13:x<-y
MCMC file: r1/baobab-msci.mcmc.txt

Prior: 0.001
Posterior: 0.0022
Bayes factor: 0.45454545454545453

In both cases, the Bayes factor falls between 0.01 and 100, so the test is indeterminate, and there is no strong evidence for introgression from Agre to Adig.

Similarly, we can test the other direction, Adig $\rightarrow$ Agre:

./savage-dickey.py 0.01 "phi:15<-12:y<-x" 1 1 r1/baobab-msci.mcmc.txt 
Cutoff: 0.01
Prior: Beta(1.0,1.0)
Parameter: phi:15<-12:y<-x
MCMC file: r1/baobab-msci.mcmc.txt

savage-dickey.py:19: RuntimeWarning: divide by zero encountered in scalar divide
  bf = prior / posterior
Prior: 0.01
Posterior: 0.0
Bayes factor: inf
./savage-dickey.py 0.001 "phi:15<-12:y<-x" 1 1 r1/baobab-msci.mcmc.txt
Cutoff: 0.001
Prior: Beta(1.0,1.0)
Parameter: phi:15<-12:y<-x
MCMC file: r1/baobab-msci.mcmc.txt

savage-dickey.py:19: RuntimeWarning: divide by zero encountered in scalar divide
  bf = prior / posterior
Prior: 0.001
Posterior: 0.0
Bayes factor: inf

Both values of $\varepsilon$ (0.01 and 0.001) give the Bayes factor of infinity because all values of $\varphi_{x \rightarrow y}$ in the posterior sample are outside of the null set, giving an estimate $P(\phi | X) = 0$. We conclude that there is overwhelming evidence for introgression from Adig into Agre.

⚠️ **GitHub.com Fallback** ⚠️