Parameter estimation under the MSC‐I (Baobab dataset, model D) - bpp/bpp-tutorial-geneflow GitHub Wiki
This demo we will infer the parameters for the MSC-I model with the following topology using the Baobabs dataset.
We will use the following folder structure:
.
|-- baobab
| |-- A00
| | |-- baobab.A00.ctl
| | `-- baobab.A00.msci.ctl
| |-- A01
| | `-- baobab.A01.ctl
| |-- baobab.map.txt
| `-- baobab.phy
`-- bpp
`-- A00
|-- r1
| `-- baobab.A00.msci.ctl
`-- r2
`-- baobab.A00.msci.ctl
8 directories, 7 files
Below are the steps to create the folder structure, download the data files, and copy control files to the folders where we will run BPP:
# create folder structure
mkdir -p bpp/A00/r{1,2}
# download and extract data files
wget https://github.com/bpp/bpp-tutorial-geneflow/raw/main/data/baobab.tar.gz
tar zxvf baobab.tar.gz
rm baobab.tar.gz
# copy control files to the right folders
cp baobab/A00/baobab.A00.msci.ctl bpp/A00/r1
cp baobab/A00/baobab.A00.msci.ctl bpp/A00/r2
You can check if your folder structure resembles the one described above using the commands:
tree --charset=ascii
Create a file called baobab-msci-defs.txt
and use the following specification:
tree (((((Amad, Arub)e, Agra)d, Adig)c, Agre)b, Smic)a;
bidirection Adig c, Agre b as x y phi=0.2,0.2
This can be done by typing
cd bpp/A00
echo 'tree (((((Amad, Arub)e, Agra)d, Adig)c, Agre)b, Smic)a;' > baobab-msci-defs.txt
echo 'bidirection Adig c, Agre b as x y phi=0.2,0.2' >> baobab-msci-defs.txt
and then run bpp
with --msci-create
:
bpp --msci-create baobab-msci-defs.txt
You should obtain the following newick tree:
(((((Amad,Arub)e,Agra)d,(Adig,y[&phi=0.200000])x)c,(Agre,x[&phi=0.200000])y)b, Smic)a;
For more information on the --msci-create
option, see the documentation.
To perform posterior inference, create a control file called baobab.A00.msci.ctl
in bpp/A00/
with the following content:
seed = -1
seqfile = ../../../baobab/baobab.phy
Imapfile = ../../../baobab/baobab.map.txt
jobname = baobab-msci
speciesdelimitation = 0
speciestree = 0
# speciesmodelprior = 1 # 0: uniform LH; 1:uniform rooted trees; 2: uniformSLH; 3: uniformSRooted
species&tree = 6 Adig Agra Agre Amad Arub Smic
3 2 1 2 2 1
(((((Amad,Arub)e,Agra)d,(Adig,y[&phi=0.200000])x)c,(Agre,x[&phi=0.200000])y)b, Smic)a;
phase = 0 0 0 0 0 0
usedata = 1 # 0: no data (prior); 1:seq like
nloci = 100 # number of data sets in seqfile
cleandata = 0 # remove sites with ambiguity data (1:yes, 0:no)?
thetaprior = gamma 2 400 # gamma(a, b) for theta
tauprior = gamma 3 200 # gamma(a, b) for root tau & Dirichlet(a) for other tau's
phiprior = 1 1 # beta(a,b)
locusrate = 0
clock = 1
finetune = 1
print = 1 0 0 0 # MCMC samples, locusrate, heredityscalars, Genetrees
burnin = 50000
sampfreq = 10
nsample = 10000
Then copy this to the two run directories:
cp baobab.A00.msci.ctl r1
cp baobab.A00.msci.ctl r2
Note that we have the new keyword phiprior
in the control file, which is used to assign a Beta prior on the parameter
The syntax is
phiprior = alpha beta
where alpha
and beta
describe the Beta distribution
We typically use betaprior = 1 1
which is equivalent to the uniform distribution.
Check that the BPP control files in bpp/A00/r1
and bpp/A00/r2
point to the right PHYLIP (sequence alignment) and map files. They should have the following form:
seqfile = ../../../baobab/baobab.phy
Imapfile = ../../../baobab/baobab.map.txt
You can check by:
cat r1/baobab.A00.msci.ctl
cat r2/baobab.A00.msci.ctl
Finally, run bpp:
cd r1
bpp --cfile baobab.A00.msci.ctl
If you run into problems with core pinning, or BPP exits with the error: "Error while pinning thread to core." then try running BPP with the --no-pin
option:
bpp --no-pin --cfile baobab.A00.msci.ctl
We should run the program at least twice and compare the outputs from the two runs to check for convergence.
To do another run:
cd ../r2
bpp --cfile baobab.A00.msci.ctl
See a sample output file baobab-msci.txt
here.
The first part of the output is similar to the output of the A00 run for the MSC model. In addition, it includes a table detailing the node relationships within the species tree, as well as information on nodes participating in hybridization or introgression events. See a sample output file for the MSC-I analysis.
.
.
Starting timer..
Seed: -794801753 (randomly generated)
Parsing species tree... Done
Species tree contains hybridization/introgression events.
Hybridization events: 0
Bidirectional introgressions: 1
Label Node Child1 Child2 Parent
Adig 0 N/A N/A 11
Agra 1 N/A N/A 9
Agre 2 N/A N/A 12
Amad 3 N/A N/A 10
Arub 4 N/A N/A 10
Smic 5 N/A N/A 6
a 6 7 5 N/A
b 7 8 12 6
c 8 9 11 7
d 9 10 1 8
e 10 3 4 9
x 11 0 14 8 [tau = 1, phi = 0.800000, prop_tau = 1, has_phi = 0]
y 12 2 13 7 [tau = 1, phi = 0.800000, prop_tau = 0, has_phi = 0]
x 13 N/A N/A 12 Mirrored bidirection node [Hybrid = x (11)] [tau = 0, phi = 0.200000, prop_tau = 0, has_phi = 1] phi_x : y -> x
y 14 N/A N/A 11 Mirrored bidirection node [Hybrid = y (12)] [tau = 0, phi = 0.200000, prop_tau = 0, has_phi = 1] phi_y : x -> y
.
.
The last four lines indicate that nodes x
and y
are "hybridization" nodes, each having two parent nodes (hence listed twice). For example, the parents of x
have indices nodes 8 and 12, which correspond to nodes c
and y
, respectively. The last two lines also contain the definition of the two introgression probabilities: phi_x : y -> x
(phi_y : x -> y
(
Next, BPP summarises the input data and the model:
Locus | Model | Sequences | Length | Ambiguous sites | Compressed | Base freqs
-------+-------+-----------+--------+-----------------+------------+------------
1 | JC69 | 10 | 960 | 12 | 19 | Fixed
2 | JC69 | 10 | 933 | 72 | 18 | Fixed
.
.
Parsing map file...
Adi001 Adig
Adi002 Adig
Aga001 Agra
Aga002 Agra
Age001 Agre
Ama006 Amad
Ama018 Amad
Aru001 Arub
Aru127 Arub
Smi165 Smic
Done
Per-locus sequences in data and 'species&tree' tag:
C.File | Data | Status | Population
-------+------+--------------------------------------+-----------
3 | 2 | [OK] | Adig
2 | 2 | [OK] | Agra
1 | 1 | [OK] | Agre
2 | 2 | [OK] | Amad
2 | 2 | [OK] | Arub
1 | 1 | [OK] | Smic
Map of populations and ancestors (1 in map indicates ancestor):
Species 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 Adig 1 0 0 0 0 0 1 1 1 0 0 1 1 1 0
2 Agra 0 1 0 0 0 0 1 1 1 1 0 0 0 0 0
3 Agre 0 0 1 0 0 0 1 1 1 0 0 1 1 0 1
4 Amad 0 0 0 1 0 0 1 1 1 1 1 0 0 0 0
5 Arub 0 0 0 0 1 0 1 1 1 1 1 0 0 0 0
6 Smic 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0
7 a 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
8 b 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0
9 c 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0
10 d 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0
11 e 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0
12 x 0 0 0 0 0 0 1 1 1 0 0 1 0 0 0
13 y 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0
14 x 0 0 0 0 0 0 1 1 0 0 0 0 1 1 0
15 y 0 0 0 0 0 0 1 1 1 0 0 1 0 0 1
Generating gene trees.... Done
Initial MSC density and log-likelihood of observing data:
log-PG0 = 4253.643308 log-L0 = -474669.864998
[EXPERIMENTAL] - New extended IM rubberband algorithm
Theta proposal: Mixed: Sliding window (0.20) + Inv-G approx Gibbs (0.80))
Linked thetas: none
0:00 taken to read and process data..
Restarting timer...
The next part provides a description of the values printed on screen during the MCMC run. BPP prints many values on screen which we can cluster into three categories:
- The first column is a progress indicator with the percentage of the the completed MCMC run. Negative progress means we are still in the burn-in.
- The next block of columns shows the acceptance proportions for each activated MCMC move. In our case we have seven active proposals.
- The remaining columns indicate information about the chain at that point in the run. Those are running means for three thetas and three taus, the phi parameter, the sum of gene tree log-densities, and the mean log-likelihood.
Since finetune was set to 1, BPP optimizes step-length values during burn-in, such that acceptance proportions (Pjump) are around 30%. To achieve this, BPP typically does four rounds of optimization during the burn-in and prints on screen the Current Pjump
(acceptance proportions for each enabled MCMC move at that step), Current finetune
(step-length/tuning parameters at that step) and New finetune
(the new optimized tuning parameter values).
-*- Terms index -*-
Prgs: progress of MCMC run (negative progress means burnin)
Gage: gene-tree age proposal
Gspr: gene-tree SPR proposal
th1: species tree theta proposal (tips) (sliding window)
th2: species tree theta proposal (inner) (sliding window)
thg: species tree theta proposal (inner) (gibbs sampler)
tau: species tree tau proposal
mix: mixing proposal
phis: MSCi phi parameter proposal (sliding window)
phig: MSCi phi parameter proposal (gibbs sampler)
theta1: mean theta of node 1
theta2: mean theta of node 2
theta3: mean theta of node 4
tau1: mean tau of node 6
tau2: mean tau of node 7
tau3: mean tau of node 8
phi1: mean of phi_x : y -> x
phi2: mean of phi_y : x -> y
log-PG: log-probability of gene trees (MSCi)
log-L: mean log-L of observing data
| Acceptance proportions |
Prgs | Gage Gspr th1 th2 thg tau mix phis phig | theta1 theta2 theta3 tau1 tau2 tau3 phi1 phi2 log-PG log-L
-------------------------------------------------------------------------------------------------------------------------------------------
-45% 0.32 0.41 0.67 0.61 0.94 0.08 0.01 0.99 1.00 0.0059 0.0053 0.0044 0.0134 0.0131 0.0049 0.0966 0.7366 3544.58397 -451106.27456 0:28
-40% 0.32 0.41 0.68 0.62 0.94 0.08 0.00 0.99 1.00 0.0059 0.0053 0.0045 0.0134 0.0132 0.0049 0.1003 0.7385 3556.31541 -451094.28833 0:57
-37% 0.32 0.41 0.68 0.63 0.94 0.08 0.00 0.99 1.00 0.0059 0.0053 0.0044 0.0132 0.0129 0.0049 0.1216 0.7313 3589.05710 -451089.16198
Gage Gspr th1 th2 tau mix phis
Current Pjump: 0.32121 0.41035 0.68302 0.62502 0.08164 0.00472 0.98733
Current finetune: 5.00000 0.00100 0.00050 0.00200 0.00100 0.30000 0.00100
New finetune: 5.41911 0.00148 0.00181 0.00587 0.00025 0.00437 0.09858
=> 'finetune = 1 Gage:5.419111 Gspr:0.001475 th1:0.001805 th2:0.005875 tau:0.000253 mix:0.004365 phis:0.098578'
-35% 0.32 0.33 0.22 0.29 0.95 0.34 0.64 0.30 1.00 0.0058 0.0053 0.0044 0.0126 0.0117 0.0048 0.2044 0.7045 3550.73115 -451072.03009 1:26
-30% 0.32 0.33 0.23 0.31 0.95 0.34 0.63 0.31 1.00 0.0058 0.0052 0.0044 0.0124 0.0115 0.0048 0.2127 0.6949 3572.09331 -451069.88734 1:56
-25% 0.32 0.34 0.24 0.32 0.95 0.32 0.61 0.30 1.00 0.0059 0.0053 0.0044 0.0130 0.0124 0.0048 0.1662 0.7154 3542.00140 -451074.89713 2:26
Gage Gspr th1 th2 tau mix phis
Current Pjump: 0.32059 0.33603 0.23604 0.31958 0.31829 0.61232 0.30035
Current finetune: 5.41911 0.00148 0.00181 0.00587 0.00025 0.00437 0.09858
New finetune: 5.85973 0.00169 0.00138 0.00633 0.00027 0.01229 0.09871
=> 'finetune = 1 Gage:5.859731 Gspr:0.001688 th1:0.001377 th2:0.006329 tau:0.000271 mix:0.012285 phis:0.098712'
-20% 0.32 0.31 0.34 0.41 0.94 0.29 0.37 0.28 1.00 0.0059 0.0052 0.0044 0.0133 0.0131 0.0049 0.1382 0.7374 3565.97303 -451079.93841 2:58
-15% 0.32 0.31 0.35 0.40 0.94 0.28 0.36 0.28 1.00 0.0059 0.0052 0.0044 0.0134 0.0132 0.0049 0.1275 0.7387 3562.40911 -451079.19182 3:29
-12% 0.32 0.31 0.35 0.40 0.94 0.28 0.36 0.28 1.00 0.0059 0.0053 0.0044 0.0134 0.0132 0.0049 0.1248 0.7388 3563.90697 -451078.88961
Gage Gspr th1 th2 tau mix phis
Current Pjump: 0.32201 0.31235 0.35026 0.40306 0.28029 0.36168 0.27962
Current finetune: 5.85973 0.00169 0.00138 0.00633 0.00027 0.01229 0.09871
New finetune: 6.36968 0.00177 0.00166 0.00912 0.00025 0.01539 0.09102
=> 'finetune = 1 Gage:6.369682 Gspr:0.001770 th1:0.001658 th2:0.009116 tau:0.000251 mix:0.015391 phis:0.091021'
-10% 0.32 0.30 0.22 0.25 0.94 0.29 0.30 0.32 1.00 0.0059 0.0053 0.0044 0.0135 0.0133 0.0049 0.1307 0.7398 3546.67042 -451078.93730 4:00
-5% 0.32 0.30 0.24 0.25 0.95 0.32 0.30 0.33 1.00 0.0059 0.0053 0.0044 0.0129 0.0123 0.0049 0.1748 0.7178 3564.90472 -451074.85603 4:31
0% 0.32 0.30 0.24 0.25 0.95 0.32 0.30 0.33 1.00 0.0058 0.0053 0.0044 0.0128 0.0124 0.0048 0.1743 0.7211 3569.36430 -451074.64288 5:00
Gage Gspr th1 th2 tau mix phis
Current Pjump: 0.32087 0.30239 0.24451 0.25478 0.32167 0.30064 0.32558
Current finetune: 6.36968 0.00177 0.00166 0.00912 0.00025 0.01539 0.09102
New finetune: 6.89478 0.00179 0.00132 0.00757 0.00027 0.01543 0.10026
=> 'finetune = 1 Gage:6.894777 Gspr:0.001786 th1:0.001315 th2:0.007569 tau:0.000272 mix:0.015429 phis:0.100258'
5% 0.32 0.30 0.33 0.34 0.95 0.30 0.30 0.28 1.00 0.0059 0.0053 0.0045 0.0125 0.0122 0.0049 0.1758 0.7203 3533.79472 -451071.55092 5:28
10% 0.32 0.30 0.34 0.34 0.95 0.30 0.30 0.28 1.00 0.0059 0.0053 0.0045 0.0128 0.0124 0.0049 0.1636 0.7210 3499.55395 -451073.70877 5:57
15% 0.32 0.30 0.34 0.34 0.95 0.30 0.30 0.28 1.00 0.0058 0.0053 0.0045 0.0126 0.0122 0.0048 0.1770 0.7150 3555.84297 -451071.94304 6:26
20% 0.32 0.30 0.35 0.34 0.95 0.31 0.30 0.29 1.00 0.0058 0.0052 0.0044 0.0126 0.0120 0.0048 0.1876 0.7104 3570.93935 -451070.94204 6:54
25% 0.32 0.30 0.35 0.34 0.95 0.31 0.30 0.29 1.00 0.0058 0.0052 0.0044 0.0126 0.0120 0.0048 0.1900 0.7098 3520.54856 -451070.92273 7:23
30% 0.32 0.30 0.35 0.34 0.95 0.31 0.30 0.29 1.00 0.0058 0.0052 0.0044 0.0127 0.0122 0.0048 0.1792 0.7135 3540.97213 -451072.34089 7:51
35% 0.32 0.30 0.35 0.34 0.95 0.30 0.29 0.29 1.00 0.0058 0.0052 0.0044 0.0129 0.0124 0.0048 0.1680 0.7177 3588.96851 -451073.57006 8:19
40% 0.32 0.30 0.35 0.34 0.95 0.30 0.29 0.29 1.00 0.0058 0.0052 0.0044 0.0129 0.0125 0.0048 0.1646 0.7198 3529.21983 -451074.14537 8:48
45% 0.32 0.30 0.36 0.34 0.95 0.30 0.29 0.29 1.00 0.0058 0.0052 0.0044 0.0130 0.0125 0.0048 0.1623 0.7217 3562.01332 -451074.62358 9:17
50% 0.32 0.30 0.36 0.34 0.95 0.30 0.29 0.29 1.00 0.0058 0.0052 0.0044 0.0130 0.0126 0.0048 0.1577 0.7231 3572.54096 -451074.85311 9:46
55% 0.32 0.30 0.36 0.34 0.95 0.30 0.29 0.29 1.00 0.0058 0.0052 0.0044 0.0130 0.0126 0.0048 0.1586 0.7232 3574.79113 -451074.82793 10:17
60% 0.32 0.30 0.36 0.34 0.95 0.29 0.29 0.29 1.00 0.0059 0.0052 0.0044 0.0130 0.0127 0.0048 0.1557 0.7244 3545.94588 -451075.25304 10:47
65% 0.32 0.30 0.36 0.34 0.95 0.29 0.29 0.29 1.00 0.0059 0.0052 0.0044 0.0130 0.0126 0.0048 0.1553 0.7240 3514.09940 -451075.13946 11:17
70% 0.32 0.30 0.36 0.34 0.95 0.30 0.29 0.29 1.00 0.0058 0.0052 0.0044 0.0130 0.0126 0.0048 0.1576 0.7234 3577.00182 -451075.02462 11:46
75% 0.32 0.30 0.36 0.34 0.95 0.29 0.29 0.29 1.00 0.0059 0.0052 0.0044 0.0131 0.0127 0.0048 0.1561 0.7251 3567.12449 -451075.71562 12:15
80% 0.32 0.30 0.36 0.34 0.95 0.29 0.29 0.29 1.00 0.0059 0.0052 0.0044 0.0131 0.0127 0.0048 0.1554 0.7250 3571.62652 -451075.78153 12:44
85% 0.32 0.30 0.36 0.34 0.95 0.30 0.29 0.29 1.00 0.0059 0.0052 0.0044 0.0131 0.0127 0.0048 0.1574 0.7243 3558.26288 -451075.57401 13:12
90% 0.32 0.30 0.36 0.34 0.95 0.30 0.29 0.29 1.00 0.0059 0.0052 0.0044 0.0130 0.0126 0.0048 0.1576 0.7239 3524.88323 -451075.56414 13:40
95% 0.32 0.30 0.36 0.34 0.95 0.30 0.29 0.29 1.00 0.0059 0.0052 0.0044 0.0130 0.0126 0.0048 0.1580 0.7239 3582.15262 -451075.61879 14:08
100% 0.32 0.30 0.36 0.34 0.95 0.30 0.29 0.29 1.00 0.0059 0.0052 0.0044 0.0131 0.0127 0.0048 0.1567 0.7247 3557.13007 -451075.81724 14:36
14:36 spent in MCMC
After MCMC sampling is complete, the remaining part provides posterior summaries:
Node-Index Node-Type Node-Label
---------------------------------
1 Tip Adig
2 Tip Agra
3 Tip Agre
4 Tip Amad
5 Tip Arub
6 Tip Smic
7 Root a
8 Inner b
9 Inner c
10 Inner d
11 Inner e
12 Hybrid x (parental node index 9)
13 Hybrid y (parental node index 8)
14 Hybrid x (parental node index 13)
15 Hybrid y (parental node index 12)
Resolving potential unidentifiability for BDI x <-> y
lnL0: -10.508458
Round 0, 0 points moved...
lnL = -10.508458
0 points reflected
Printing processed sample into baobab-msci.mcmc.txt.processed
param mean median S.D min max 2.5% 97.5% 2.5%HPD 97.5%HPD ESS* Eff* rho1
---------------------------------------------------------------------------------------------------------------------------------------
theta:1 0.005849 0.005800 0.000688 0.003578 0.009940 0.004641 0.007330 0.004569 0.007224 9986.208979 0.998621 0.014684
theta:2 0.005262 0.005215 0.000649 0.003183 0.008292 0.004127 0.006665 0.004067 0.006568 10237.355673 1.023736 0.005371
theta:4 0.004445 0.004406 0.000539 0.002895 0.007007 0.003496 0.005620 0.003415 0.005499 10082.301232 1.008230 -0.001069
theta:5 0.006219 0.006150 0.000803 0.003896 0.010532 0.004825 0.007965 0.004685 0.007780 10098.420255 1.009842 0.009405
theta:7 0.033757 0.033638 0.002843 0.024151 0.047001 0.028521 0.039647 0.028136 0.039215 131.103801 0.013110 0.258467
theta:8 0.009387 0.008101 0.005724 0.000156 0.039408 0.001808 0.023530 0.000921 0.021010 248.338498 0.024834 0.425168
theta:9 0.014915 0.014825 0.001591 0.009557 0.022386 0.012089 0.018263 0.012059 0.018214 110.487477 0.011049 0.543591
theta:10 0.016262 0.016032 0.003535 0.005511 0.040398 0.009931 0.023750 0.009703 0.023410 3046.377211 0.304638 0.217857
theta:11 0.007273 0.006540 0.004001 0.000257 0.029966 0.001575 0.016789 0.000632 0.014871 6941.576652 0.694158 0.105135
theta:12 0.005666 0.004968 0.003340 0.000181 0.026425 0.001125 0.013845 0.000652 0.012241 8162.401110 0.816240 0.054259
theta:13 0.017956 0.017792 0.006189 0.000150 0.048453 0.005088 0.030793 0.005086 0.030781 294.195517 0.029420 0.357363
tau:7 0.013076 0.013069 0.000846 0.010836 0.015727 0.011621 0.014661 0.011559 0.014593 34.775156 0.003478 0.956230
tau:8 0.012689 0.012720 0.001054 0.009072 0.015581 0.010683 0.014504 0.010775 0.014572 29.469507 0.002947 0.968117
tau:9 0.004843 0.004844 0.000161 0.004238 0.005421 0.004523 0.005158 0.004524 0.005158 1031.370566 0.103137 0.716672
tau:10 0.003627 0.003625 0.000190 0.002908 0.004330 0.003261 0.004007 0.003243 0.003986 2343.988014 0.234399 0.503285
tau:11 0.003360 0.003369 0.000198 0.002581 0.003992 0.002949 0.003728 0.002964 0.003740 2666.929425 0.266693 0.518095
tau:12 0.004605 0.004619 0.000216 0.003559 0.005317 0.004139 0.004986 0.004138 0.004984 1687.610350 0.168761 0.503691
tau:13 0.004605 0.004619 0.000216 0.003559 0.005317 0.004139 0.004986 0.004138 0.004984 1687.610350 0.168761 0.503691
phi:14<-13 0.156107 0.154812 0.072394 0.000037 0.428450 0.014930 0.297696 0.000037 0.276364 65.489330 0.006549 0.700277
phi:15<-12 0.724406 0.727535 0.057785 0.463846 0.924433 0.604892 0.830219 0.613397 0.835491 252.719212 0.025272 0.234032
lnL -451075.675359 -451075.327500 24.651807 -451176.021000 -450985.951000 -451124.549000 -451028.660000 -451123.709000 -451028.027000 539.392134 0.053939 0.190325
Species tree network:
(((((Amad,Arub)e,Agra)d,(Adig,y[&phi=0.724406])x)c,(Agre,x[&phi=0.156107])y)b, Smic)a;
Species tree network with attributes (thetas and tau 95% HPD), and branch lengths (tau difference):
(((((Amad: 0.003360, Arub: 0.003360)e[&height_95%_HPD={0.00296400, 0.00374000}, theta=0.0072728]: 0.000267, Agra: 0.003627)d[&height_95%_HPD={0.00324300, 0.00398600}, theta=0.0162619]: 0.001216, (Adig: 0.004605, y)x[&height_95%_HPD={0.00413800, 0.00498400}, theta=0.0056665]: 0.000238)c[&height_95%_HPD={0.00452400, 0.00515800}, theta=0.0149149]: 0.007847, (Agre: 0.004605, x)y[&height_95%_HPD={0.00413800, 0.00498400}, theta=0.0179562]: 0.008084)b[&height_95%_HPD={0.01077500, 0.01457200}, theta=0.0093870]: 0.000387, Smic: 0.013076)a[&height_95%_HPD={0.01155900, 0.01459300}, theta=0.0337574];
Species tree network with taus and thetas:
(((((Amad #0.004445, Arub #0.006219)e:0.003360 #0.007273, Agra #0.005262)d:0.003627 #0.016262, (Adig #0.005849, y)x[&phi=0.843893]:0.004605 #0.005666)c:0.004843 #0.014915, (Agre , x)y[&phi=0.275594]:0.004605 #0.017956)b:0.012689 #0.009387, Smic )a:0.013076 #0.033757;
FigTree tree is in baobab-msci.FakeTree.tre
Summarizing parameter estimates using file baobab-msci.conditional_a1b1.txt ...
param mean S.D 2.5% 97.5% 2.5%HPD 97.5%HPD Effu Effy c
----------------------------------------------------------------------------------------------------
theta:1 0.005837 0.000665 0.004676 0.007270 0.004559 0.007132 1.495599 0.646056 8.279819
theta:2 0.005243 0.000623 0.004154 0.006586 0.004059 0.006466 4.675128 0.930942 7.157320
theta:4 0.004434 0.000520 0.003526 0.005555 0.003431 0.005442 4.564332 0.933249 6.776672
theta:5 0.006211 0.000773 0.004871 0.007891 0.004760 0.007742 5.790023 0.976600 6.722669
theta:7 0.033760 0.002817 0.028521 0.039517 0.028183 0.039156 0.013061 0.012950 2.933831
theta:8 0.009071 0.005620 0.001846 0.023237 0.000773 0.020682 0.021123 0.021046 1.210392
theta:9 0.014908 0.001567 0.012056 0.018224 0.011818 0.017973 0.009805 0.009779 1.389252
theta:10 0.016253 0.003510 0.009978 0.023764 0.009529 0.023230 0.350036 0.317527 1.413411
theta:11 0.007117 0.003867 0.001592 0.016295 0.000824 0.014646 0.722229 0.633418 1.240902
theta:12 0.005370 0.003086 0.001125 0.012869 0.000469 0.011292 1.015977 0.781320 1.419672
theta:13 0.017865 0.005923 0.004883 0.030107 0.005298 0.030481 0.028692 0.028314 1.870696
phi:14<-13 0.156806 0.072199 0.014482 0.297885 0.000223 0.276942 0.006234 0.006227 1.251353
phi:15<-12 0.724741 0.057088 0.606228 0.829097 0.608968 0.831381 0.020109 0.019873 2.440936
Once the run is finished, there will be five output files:
-
baobab-msci.txt
: main output file containing the above information -
baobab-msci.mcmc.txt
: MCMC sample of model parameters -
baobab-msci.conditional_a1b1.txt
: MCMC sample of$a$ and$b$ parameters of the gamma distributions in Gibbs sampling, used for generating final posterior summaries. -
baobab-msci.FakeTree.tre
: fitted model, which can be visualised in FigTree; see figure below -
baobab-msci.SeedUsed
: a random seed used by BPP
From the posterior estimates, we have detected some gene flow between Adig
and Agre
in both directions, with
Agre
to Adig
(phi:14<-13
) and tau:12
and tau:13
). Numbers in the parentheses are the 95% HPD interval.
Try completing two runs and check the convergence of MCMC. A helpful tool is Tracer, but BPP provides some helpful summary statistics too. Here, we provide a python script (scatter-params.py) to help plot posteriors for theta and tau parameters. If we are satisfied, we can use one MCMC analysis from BPP or combine them to report parameter estimates. The plotting script can be executed from the command line with
# download script
wget https://raw.githubusercontent.com/bpp/bpp-tutorial-geneflow/main/third-day/scripts/scatter-params.py
# add execute permissions
chmod +x scatter-params.py
Then run script on your two mcmc files:
./scatter-params.py <mcmc1.txt> <mcmc2.txt>
The script will produce three plots, one for thetas (thetas.pdf
), one for taus (taus.pdf
) and one for phis (phis.pdf
).
It is possible to constrain theta parameters in the model using the keyword thetamodel
. For example, we can force theta to remain unchanged after introgression. See section 5.3 in the BPP manual
The method is described here.
We will use a custom script savage-dickey.py
to parse the MCMC sample in baobab-msci.mcmc.txt
.
First, get the script:
wget https://raw.githubusercontent.com/bpp/bpp-tutorial-geneflow/main/fourth-day/scripts/savage-dickey.py
chmod +x savage-dickey.py
The script requires pandas
and scipy
.
Usage:
./savage-dickey.py
usage: savage-dickey.py cutoff column beta_prior_a beta_prior_b mcmcfile
example: savage-dickey.py 0.01 "phi_x<-w" 1 1 mcmc.model2.txt
Let's test whether each of the estimated introgression probabilities is significantly greater than zero. To simplify the notation, let
where
First, try
./savage-dickey.py 0.01 "phi:14<-13:x<-y" 1 1 r1/baobab-msci.mcmc.txt
Cutoff: 0.01
Prior: Beta(1.0,1.0)
Parameter: phi:14<-13:x<-y
MCMC file: r1/baobab-msci.mcmc.txt
Prior: 0.01
Posterior: 0.0163
Bayes factor: 0.6134969325153375
If we use a smaller threshold for the null set,
./savage-dickey.py 0.001 "phi:14<-13:x<-y" 1 1 r1/baobab-msci.mcmc.txt
Cutoff: 0.001
Prior: Beta(1.0,1.0)
Parameter: phi:14<-13:x<-y
MCMC file: r1/baobab-msci.mcmc.txt
Prior: 0.001
Posterior: 0.0022
Bayes factor: 0.45454545454545453
In both cases, the Bayes factor falls between 0.01 and 100, so the test is indeterminate, and there is no strong evidence for introgression from Agre
to Adig
.
Similarly, we can test the other direction, Adig
Agre
:
./savage-dickey.py 0.01 "phi:15<-12:y<-x" 1 1 r1/baobab-msci.mcmc.txt
Cutoff: 0.01
Prior: Beta(1.0,1.0)
Parameter: phi:15<-12:y<-x
MCMC file: r1/baobab-msci.mcmc.txt
savage-dickey.py:19: RuntimeWarning: divide by zero encountered in scalar divide
bf = prior / posterior
Prior: 0.01
Posterior: 0.0
Bayes factor: inf
./savage-dickey.py 0.001 "phi:15<-12:y<-x" 1 1 r1/baobab-msci.mcmc.txt
Cutoff: 0.001
Prior: Beta(1.0,1.0)
Parameter: phi:15<-12:y<-x
MCMC file: r1/baobab-msci.mcmc.txt
savage-dickey.py:19: RuntimeWarning: divide by zero encountered in scalar divide
bf = prior / posterior
Prior: 0.001
Posterior: 0.0
Bayes factor: inf
Both values of Adig
into Agre
.