Parameter estimation under the MSC‐M (Baobab dataset) - bpp/bpp-tutorial-geneflow GitHub Wiki

This demo we will infer the parameters in the following MSC-M model:

This is based on a species tree obtained from A01 analysis of the Baobab dataset in this tutorial, allowing for gene flow between Adig and Agre in both directions at rate $M_{\text{Agre} \rightarrow \text{Adig}} = N_\text{Adig} m_{\text{Agre} \rightarrow \text{Adig}}$ individuals per generation from Agre to Adig forward in time and $M_{\text{Adig} \rightarrow \text{Agre}} = N_\text{Agre} m_{\text{Adig} \rightarrow \text{Agre}}$ in the opposite direction during the entire time period that these two lineages coexist.

We will use the following folder structure:

.
|-- baobab
|   |-- A00
|   |   |-- baobab.A00.ctl
|   |   `-- baobab.A00.msci.ctl
|   |-- A01
|   |   `-- baobab.A01.ctl
|   |-- baobab.map.txt
|   `-- baobab.phy
`-- bpp
    `-- a00
        |-- r1
        |   `-- baobab.A00.msci.ctl
        `-- r2
            `-- baobab.A00.msci.ctl

8 directories, 7 files

See this tutorial for how to create this directory structure.

Control file

In bpp/a00/, create a control file baobab.A00.mscm.ctl with the following content:

          seed =  -1

       seqfile = ../../../baobab/baobab.phy
      Imapfile = ../../../baobab/baobab.map.txt
       jobname = baobab-mscm

 speciesdelimitation = 0
         speciestree = 0

  species&tree = 6  Adig Agra Agre Amad Arub Smic
                    3    2    1    2    2    1
                    (((((Amad, Arub), Agra), Adig), Agre), Smic);
         phase = 0 0 0 0 0 0
                  
       usedata = 1    # 0: no data (prior); 1:seq like
         nloci = 100  # number of data sets in seqfile

     cleandata = 0    # remove sites with ambiguity data (1:yes, 0:no)?

    thetaprior = gamma 2 400  # gamma(a, b) for theta
      tauprior = gamma 3 200  # gamma(a, b) for root tau & Dirichlet(a) for other tau's
        wprior = 20 1
     migration = 2
                 Adig Agre
                 Agre Adig

     locusrate = 0
         clock = 1 

      finetune = 1
         print = 1 0 0 0   # MCMC samples, locusrate, heredityscalars, Genetrees
        burnin = 50000
      sampfreq = 10
       nsample = 10000

Then copy this to the two run directories:

cp baobab.A00.mscm.ctl r1
cp baobab.A00.mscm.ctl r2

Prior distributions

Notice how the keywords wprior and migration specify the migration model shown in the figure above.

  • The keyword migration = 2 specifies that there are two migration events. The next line Adig Agre means migration from Adig to Agre, and the following line indicates another migration event from Agre to Adig.
  • The line wprior = 20 1 assigns the gamma prior $G(20,1)$, with mean 20, to both migration rates $\varpi_{\text{Agre} \rightarrow \text{Adig}} = m_{\text{Agre} \rightarrow \text{Adig}} / \mu$ and $\varpi_{\text{Adig} \rightarrow \text{Agre}} = m_{\text{Adig} \rightarrow \text{Agre}} / \mu$.

Since BPP v4.8, migration rates are parameterized in terms of mutation-scaled migration rates: $\varpi_{\text{A} \rightarrow \text{B}} = 4 M_{\text{A} \rightarrow \text{B}} / \theta_B = m_{\text{A} \rightarrow \text{B}} / \mu$ where $\mu$ is the mutation rate per site per generation. In the output, BPP provides posterior summaries of both $M$ and $\varpi$ parameters.

It is possible to specify different priors for different migration rates or to allow the migration rate to vary across loci. See Migration model in BPP and BPP manual for more details.

Running BPP

Check that the BPP control files in bpp/a00/r1 and bpp/a00/r2 point to the right PHYLIP (sequence alignment) and map files. They should have the following form:

 seqfile = ../../../baobab/baobab.phy
Imapfile = ../../../baobab/baobab.map.txt
 jobname = baobab-mscm

We're now ready to run BPP:

cd r1
bpp --cfile baobab.A00.mscm.ctl

If you run into problems with core pinning, or BPP exits with the error: "Error while pinning thread to core." then try running BPP with the --no-pin option:

bpp --no-pin --cfile baobab.A00.mscm.ctl

We should run the program at least twice and compare the outputs from the two runs to check for convergence.

To do another run:

cd ../r2
bpp --cfile baobab.A00.mscm.ctl

Synchronization point: Make sure everybody reached this stage


See a sample output file baobab-mscm.txt here.

Screen output

The first part of the output is similar to the output of the A00 run for the MSC model.

bpp v4.8.2_<arch>, <k>GB RAM, <n> cores
https://github.com/bpp/bpp

Detected CPU features: neon
Auto-selected SIMD ISA: NEON


Starting timer..
Seed: 16560695 (randomly generated)
Parsing species tree... Done
Parsing phylip file... Done

 Locus | Model | Sequences | Length | Ambiguous sites | Compressed | Base freqs 
-------+-------+-----------+--------+-----------------+------------+------------
     1 |  JC69 |        10 |    960 |              12 |         19 |      Fixed 
     2 |  JC69 |        10 |    933 |              72 |         18 |      Fixed 

.
.

Parsing map file...
Adi001 Adig
Adi002 Adig
Aga001 Agra
Aga002 Agra
Age001 Agre
Ama006 Amad
Ama018 Amad
Aru001 Arub
Aru127 Arub
Smi165 Smic
 Done

Per-locus sequences in data and 'species&tree' tag:
C.File | Data |                Status                | Population
-------+------+--------------------------------------+-----------
     3 |    2 | [OK]                                 | Adig      
     2 |    2 | [OK]                                 | Agra      
     1 |    1 | [OK]                                 | Agre      
     2 |    2 | [OK]                                 | Amad      
     2 |    2 | [OK]                                 | Arub      
     1 |    1 | [OK]                                 | Smic      


Map of populations and ancestors (1 in map indicates ancestor):
   Species                         1  2  3  4  5  6  7  8  9  10  11
 1 Adig                            1  0  0  0  0  0  1  1  1  0   0
 2 Agra                            0  1  0  0  0  0  1  1  1  1   0
 3 Agre                            0  0  1  0  0  0  1  1  0  0   0
 4 Amad                            0  0  0  1  0  0  1  1  1  1   1
 5 Arub                            0  0  0  0  1  0  1  1  1  1   1
 6 Smic                            0  0  0  0  0  1  1  0  0  0   0
 7 Amad,Arub,Agra,Adig,Agre,Smic   0  0  0  0  0  0  1  0  0  0   0
 8 Amad,Arub,Agra,Adig,Agre        0  0  0  0  0  0  1  1  0  0   0
 9 Amad,Arub,Agra,Adig             0  0  0  0  0  0  1  1  1  0   0
10 Amad,Arub,Agra                  0  0  0  0  0  0  1  1  1  1   0
11 Amad,Arub                       0  0  0  0  0  0  1  1  1  1   1

Generating gene trees.... Done

Initial MSC density and log-likelihood of observing data:
log-PG0 = 4591.757640   log-L0 = -475528.209522

[EXPERIMENTAL] - New extended IM rubberband algorithm
Theta proposal: Mixed: Sliding window (0.20) + Inv-G approx Gibbs (0.80))
Linked thetas: none
0:00 taken to read and process data..
Restarting timer...

.
.

The next part provides a description of the values printed on screen during the MCMC run. BPP prints many values on screen which we can cluster into three categories:

  • The first column is a progress indicator with the percentage of the the completed MCMC run. Negative progress means we are still in the burn-in.
  • The next block of columns shows the acceptance proportions for each activated MCMC move. In our case we have seven active proposals.
  • The remaining columns indicate information about the chain at that point in the run. Those are running means for three thetas and three taus, the phi parameter, the sum of gene tree log-densities, and the mean log-likelihood.

Since finetune was set to 1, BPP optimizes step-length values during burn-in, such that acceptance proportions (Pjump) are around 30%. To achieve this, BPP typically does four rounds of optimization during the burn-in and prints on screen the Current Pjump (acceptance proportions for each enabled MCMC move at that step), Current finetune (step-length/tuning parameters at that step) and New finetune (the new optimized tuning parameter values).

-*- Terms index -*-

  Prgs: progress of MCMC run (negative progress means burnin)
  Gage: gene-tree age proposal
  Gspr: gene-tree SPR proposal
   th1: species tree theta proposal (tips)   (sliding window)
   th2: species tree theta proposal (inner)  (sliding window)
   thg: species tree theta proposal (inner)  (gibbs sampler)
   tau: species tree tau proposal
   mix: mixing proposal
  mrte: migration rates proposal
theta1: mean theta of node 1
theta2: mean theta of node 2
theta3: mean theta of node 3
  tau1: mean tau of node 6
  tau2: mean tau of node 7
  tau3: mean tau of node 8
    W1: mean migration rate Adig -> Agre
    W2: mean migration rate Agre -> Adig
log-PG: log-probability of gene trees (MSC)
 log-L: mean log-L of observing data

     |         Acceptance proportions          |
Prgs | Gage Gspr  th1  th2  thg  tau  mix mrte | theta1 theta2 theta3    tau1   tau2   tau3      W1     W2       log-PG          log-L
--------------------------------------------------------------------------------------------------------------------------------------
 -45%  0.32 0.20 0.56 0.39 0.95 0.08 0.01 1.00   0.0060 0.0053 0.0106  0.0137 0.0063 0.0050  25.1446 17.1438   3598.20493  -451124.00374  0:40
 -40%  0.32 0.20 0.59 0.41 0.95 0.07 0.01 1.00   0.0059 0.0053 0.0103  0.0134 0.0057 0.0050  21.6740 16.9364   3596.51315  -451102.78379  1:19
 -37%  0.32 0.20 0.59 0.42 0.95 0.07 0.01 1.00   0.0058 0.0053 0.0100  0.0132 0.0056 0.0050  20.8140 16.7934   3595.02573  -451096.70374

                     Gage    Gspr     th1     th2     tau     mix     mrte  
Current Pjump:     0.32373 0.20157 0.59289 0.41542 0.07344 0.00608  1.00000
Current finetune:  5.00000 0.00100 0.00050 0.00200 0.00100 0.30000  0.10000
New finetune:      5.46989 0.00064 0.00132 0.00300 0.00023 0.00562 10.00000

=> 'finetune = 1 Gage:5.469887 Gspr:0.000643 th1:0.001319 th2:0.003000 tau:0.000227 mix:0.005623 mrte:10.000000'

 -35%  0.32 0.20 0.34 0.30 0.95 0.35 0.61 1.00   0.0059 0.0053 0.0078  0.0127 0.0052 0.0050  18.7407 14.7667   3576.54572  -451083.97993  1:59
 -30%  0.32 0.20 0.36 0.33 0.95 0.35 0.61 1.00   0.0058 0.0052 0.0089  0.0127 0.0051 0.0050  18.6320 15.6514   3559.91174  -451079.24713  2:33
 -25%  0.32 0.20 0.37 0.34 0.95 0.34 0.61 1.00   0.0058 0.0052 0.0090  0.0128 0.0051 0.0050  18.2468 15.9587   3573.78734  -451078.80836  3:08


                     Gage    Gspr     th1     th2     tau     mix     mrte  
Current Pjump:     0.32425 0.20045 0.36871 0.33725 0.34277 0.61056  1.00000
Current finetune:  5.46989 0.00064 0.00132 0.00300 0.00023 0.00562 10.00000
New finetune:      5.99552 0.00041 0.00169 0.00345 0.00027 0.01573 99.00000

=> 'finetune = 1 Gage:5.995522 Gspr:0.000411 th1:0.001693 th2:0.003447 tau:0.000267 mix:0.015733 mrte:99.000000'

 -20%  0.32 0.20 0.20 0.22 0.95 0.31 0.30 1.00   0.0057 0.0052 0.0104  0.0133 0.0051 0.0050  18.0526 16.9608   3587.78173  -451082.88931  3:43
 -15%  0.32 0.20 0.22 0.24 0.95 0.31 0.30 1.00   0.0057 0.0053 0.0095  0.0129 0.0051 0.0050  18.2294 15.9860   3519.71277  -451079.97448  4:18
 -12%  0.32 0.20 0.23 0.25 0.95 0.30 0.30 1.00   0.0057 0.0053 0.0095  0.0128 0.0051 0.0050  18.0690 16.0099   3673.94059  -451078.28236

                     Gage    Gspr     th1     th2     tau     mix     mrte  
Current Pjump:     0.32395 0.20042 0.22710 0.24801 0.30258 0.29520  1.00000
Current finetune:  5.99552 0.00041 0.00169 0.00345 0.00027 0.01573 99.00000
New finetune:      6.56418 0.00026 0.00124 0.00278 0.00027 0.01544 99.00000

=> 'finetune = 1 Gage:6.564178 Gspr:0.000263 th1:0.001239 th2:0.002778 tau:0.000269 mix:0.015440 mrte:99.000000'

 -10%  0.32 0.20 0.36 0.48 0.95 0.31 0.32 1.00   0.0057 0.0052 0.0100  0.0137 0.0052 0.0050  18.4950 16.9661   3614.10536  -451089.61901  4:53
  -5%  0.32 0.20 0.38 0.47 0.95 0.30 0.31 1.00   0.0057 0.0052 0.0094  0.0133 0.0051 0.0049  18.2355 16.1612   3633.92606  -451083.28954  5:27
   0%  0.32 0.20 0.39 0.47 0.95 0.30 0.31 1.00   0.0057 0.0052 0.0100  0.0132 0.0052 0.0050  18.3117 16.7202   3629.38605  -451082.20684  6:03


                     Gage    Gspr     th1     th2     tau     mix     mrte  
Current Pjump:     0.32385 0.20118 0.38564 0.46643 0.30243 0.30896  1.00000
Current finetune:  6.56418 0.00026 0.00124 0.00278 0.00027 0.01544 99.00000
New finetune:      7.18423 0.00017 0.00168 0.00491 0.00027 0.01598 99.00000

=> 'finetune = 1 Gage:7.184227 Gspr:0.000169 th1:0.001684 th2:0.004905 tau:0.000272 mix:0.015982 mrte:99.000000'

   5%  0.32 0.20 0.26 0.22 0.95 0.30 0.30 1.00   0.0057 0.0052 0.0099  0.0132 0.0052 0.0050  18.5733 16.6822   3586.96479  -451082.88606  6:37
  10%  0.32 0.20 0.27 0.22 0.95 0.30 0.30 1.00   0.0057 0.0053 0.0097  0.0132 0.0051 0.0050  18.2717 16.3810   3673.52821  -451081.88899  7:12
  15%  0.32 0.20 0.28 0.23 0.95 0.30 0.30 1.00   0.0057 0.0053 0.0100  0.0131 0.0051 0.0050  18.1356 16.6024   3623.46559  -451081.63112  7:47
  20%  0.32 0.20 0.28 0.23 0.95 0.30 0.30 1.00   0.0057 0.0053 0.0099  0.0131 0.0051 0.0050  18.0103 16.5082   3540.01349  -451081.50330  8:22
  25%  0.32 0.20 0.29 0.23 0.95 0.30 0.30 1.00   0.0057 0.0053 0.0098  0.0130 0.0051 0.0050  17.9635 16.4938   3583.64457  -451080.37420  8:57
  30%  0.32 0.20 0.29 0.23 0.95 0.30 0.30 1.00   0.0057 0.0053 0.0099  0.0130 0.0051 0.0050  18.0559 16.5730   3587.50046  -451079.78115  9:31
  35%  0.32 0.20 0.29 0.23 0.95 0.30 0.30 1.00   0.0057 0.0053 0.0101  0.0130 0.0051 0.0050  18.1919 16.6538   3605.95245  -451080.42280  10:06
  40%  0.32 0.20 0.29 0.23 0.95 0.30 0.30 1.00   0.0057 0.0053 0.0102  0.0131 0.0051 0.0050  18.2125 16.7362   3642.28068  -451081.22269  10:42
  45%  0.32 0.20 0.29 0.23 0.95 0.30 0.30 1.00   0.0057 0.0053 0.0103  0.0132 0.0051 0.0050  18.2244 16.8610   3622.98962  -451082.14509  11:17
  50%  0.32 0.20 0.29 0.23 0.95 0.30 0.30 1.00   0.0057 0.0053 0.0104  0.0132 0.0051 0.0050  18.1697 16.9448   3612.76091  -451081.97542  11:52
  55%  0.32 0.20 0.30 0.23 0.95 0.30 0.30 1.00   0.0057 0.0053 0.0103  0.0131 0.0051 0.0050  18.1375 16.8522   3582.62215  -451081.53809  12:27
  60%  0.32 0.20 0.30 0.23 0.95 0.30 0.30 1.00   0.0057 0.0053 0.0103  0.0131 0.0051 0.0050  18.1137 16.9154   3665.49092  -451081.59873  13:03
  65%  0.32 0.20 0.30 0.24 0.95 0.30 0.30 1.00   0.0057 0.0053 0.0102  0.0132 0.0051 0.0050  18.1072 16.8596   3565.15538  -451081.75190  13:38
  70%  0.32 0.20 0.30 0.24 0.95 0.30 0.30 1.00   0.0057 0.0053 0.0102  0.0131 0.0051 0.0050  18.1035 16.8799   3592.51785  -451081.37965  14:15
  75%  0.32 0.20 0.30 0.24 0.95 0.30 0.30 1.00   0.0057 0.0053 0.0103  0.0131 0.0051 0.0050  18.1444 16.9280   3703.57830  -451081.26553  14:50
  80%  0.32 0.20 0.30 0.24 0.95 0.30 0.30 1.00   0.0057 0.0053 0.0103  0.0131 0.0051 0.0050  18.1612 16.8930   3610.33227  -451081.36469  15:26
  85%  0.32 0.20 0.30 0.24 0.95 0.30 0.30 1.00   0.0057 0.0053 0.0102  0.0131 0.0051 0.0050  18.1488 16.8215   3605.79561  -451081.55837  16:02
  90%  0.32 0.20 0.30 0.24 0.95 0.30 0.30 1.00   0.0057 0.0053 0.0102  0.0132 0.0051 0.0050  18.1628 16.8232   3617.92105  -451082.01390  16:38
  95%  0.32 0.20 0.30 0.24 0.95 0.30 0.30 1.00   0.0057 0.0053 0.0103  0.0132 0.0051 0.0050  18.1701 16.8685   3560.89046  -451082.08798  17:13
 100%  0.32 0.20 0.30 0.24 0.95 0.30 0.30 1.00   0.0057 0.0053 0.0102  0.0132 0.0051 0.0049  18.1476 16.8146   3569.55277  -451082.20859  17:48

17:48 spent in MCMC

After MCMC sampling is complete, the remaining part provides posterior summaries:

Node-Index  Node-Type  Node-Label
---------------------------------
1           Tip        Adig
2           Tip        Agra
3           Tip        Agre
4           Tip        Amad
5           Tip        Arub
6           Tip        Smic
7           Root       MRCA( Amad,Arub,Agra,Adig,Agre,Smic )
8           Inner      MRCA( Amad,Arub,Agra,Adig,Agre )
9           Inner      MRCA( Amad,Arub,Agra,Adig )
10          Inner      MRCA( Amad,Arub,Agra )
11          Inner      MRCA( Amad,Arub )

 param      mean      median       S.D       min        max       2.5%       97.5%     2.5%HPD   97.5%HPD      ESS*        Eff*      rho1  
-------------------------------------------------------------------------------------------------------------------------------------------
 theta:1   0.005706   0.005650   0.000692  0.003608   0.008585   0.004489   0.007222   0.004441   0.007142  3888.376363  0.388838  0.075820
 theta:2   0.005261   0.005209   0.000645  0.003352   0.008916   0.004139   0.006690   0.004068   0.006569  9368.984328  0.936898  0.028660
 theta:3   0.010190   0.009553   0.004653  0.000348   0.042140   0.002955   0.021054   0.002143   0.019577   439.960590  0.043996  0.256864
 theta:4   0.004447   0.004399   0.000547  0.002911   0.007349   0.003509   0.005646   0.003461   0.005564  9293.261929  0.929326  0.022723
 theta:5   0.006217   0.006154   0.000796  0.003877   0.010032   0.004837   0.007939   0.004763   0.007831  9598.062294  0.959806  -0.002899
 theta:7   0.033533   0.033420   0.002705  0.025028   0.044135   0.028507   0.039016   0.028500   0.039000   427.620773  0.042762  0.222535
 theta:8   0.022606   0.022534   0.001643  0.016342   0.030129   0.019545   0.026066   0.019292   0.025704  3040.588736  0.304059  0.131894
 theta:9   0.006714   0.006079   0.003682  0.000105   0.025906   0.001463   0.015589   0.000540   0.013824  4998.694430  0.499869  0.149050
theta:10   0.015532   0.015338   0.003445  0.002961   0.038897   0.009404   0.022896   0.008981   0.022322  2048.715179  0.204872  0.313470
theta:11   0.007514   0.006882   0.004081  0.000175   0.028616   0.001590   0.017339   0.000656   0.015340  6514.476854  0.651448  0.149342
   tau:7   0.013158   0.013184   0.000723  0.010954   0.015629   0.011711   0.014481   0.011707   0.014471    87.798045  0.008780  0.924202
   tau:8   0.005127   0.005121   0.000224  0.004354   0.006489   0.004713   0.005584   0.004694   0.005560   303.985457  0.030399  0.806413
   tau:9   0.004949   0.004949   0.000185  0.004219   0.005582   0.004578   0.005310   0.004593   0.005318   871.818652  0.087182  0.628823
  tau:10   0.003640   0.003637   0.000193  0.002886   0.004559   0.003276   0.004030   0.003268   0.004020  2580.324968  0.258032  0.522418
  tau:11   0.003353   0.003358   0.000204  0.002564   0.004003   0.002930   0.003737   0.002951   0.003750  2817.458583  0.281746  0.526582
  W:1->3  18.224446  17.905753   4.096706  6.516517  38.524199  11.086492  27.159340  10.681823  26.603726  2201.131848  0.220113  0.171191
  W:3->1  16.820629  16.541676   3.828763  6.526080  33.998102  10.149278  24.950489   9.446857  24.110673   618.686964  0.061869  0.225220
  M:1->3   0.046596   0.041960   0.024542  0.001269   0.206297   0.012107   0.106856   0.005363   0.094606   570.315327  0.057032  0.252136
  M:3->1   0.023915   0.023424   0.005876  0.008872   0.056797   0.013883   0.036779   0.013089   0.035529  1399.916185  0.139992  0.118656

     lnL  -451082.085812  -451081.790000  24.613414  -451170.800000  -450993.751000  -451130.517000  -451034.348000  -451128.879000  -451032.934000   770.235818  0.077024  0.235815


FigTree tree is in baobab-mscm.FigTree.tre

Once the run is finished, there will be several output files:

  • baobab-mscm.txt: main output file containing the above information
  • baobab-mscm.mcmc.txt: MCMC sample of model parameters
  • baobab-mscm.conditional_a1b1.txt: MCMC sample of $a$ and $b$ parameters of the gamma distributions in Gibbs sampling, used for generating final posterior summaries.
  • baobab-mscm.FakeTree.tre: fitted model, which can be visualised in FigTree
  • baobab-mscm.pdf: visualization of the fitted model; see figure below
  • baobab-mscm.SeedUsed: a random seed used by BPP

Convergence check

Try completing two runs and check the convergence of MCMC. A helpful tool is Tracer, but BPP provides some helpful summary statistics too. Here, we provide a python script (scatter-params.py) to help plot posteriors for theta and tau parameters. If we are satisfied, we can use one MCMC analysis from BPP or combine them to report parameter estimates. The plotting script can be executed from the command line with

# download script
wget https://raw.githubusercontent.com/bpp/bpp-tutorial-geneflow/main/third-day/scripts/scatter-params.py

# add execute permissions
chmod +x scatter-params.py

Then run script on your two mcmc files:

./scatter-params.py <mcmc1.txt> <mcmc2.txt>

The script will produce one plot for thetas (thetas.pdf) and one for taus (taus.pdf).

Testing for gene flow using Bayes factors

The estimates of the two migration rates are $w_{\text{Adig} \rightarrow \text{Agre}} = 8.224\ (10.681,\ 26.604)$ (W:1->3:Adig->Agre) and $M_{\text{Agre} \rightarrow \text{Adig}} = 16.820\ (9.447,\ 24.111)$ (W:3->1:Agre->Adig).

Q: Are these values large or small? Do they provide strong evidence for gene flow?

Consider testing $H_0: w_{\text{Adig} \rightarrow \text{Agre}} = 0$ against $H_1: w_{\text{Adig} \rightarrow \text{Agre}} &gt; 0$. We use the posterior MCMC sample of $\varphi_y$ to approximate the Bayes factor

$$ B_{10,\varepsilon} = \frac{\mathbf{P}(\phi)}{\mathbf{P}(\phi | X)} $$

where $\mathbf{P}(\phi)$ is the probability of the null region ${\phi : \varphi_y &lt; \varepsilon}$ under the prior distribution while $\mathbf{P}(\phi | X)$ is an analogous probability under the posterior distribution.

We will use a custom script savage-dickey.py to parse the MCMC sample in baobab-msci.mcmc.txt.

wget https://raw.githubusercontent.com/bpp/bpp-tutorial-geneflow/main/fourth-day/scripts/savage-dickey-gamma.py
chmod +x savage-dickey-gamma.py

Q: What values of $\varepsilon$ should we use?

First, try $\varepsilon$ that gives 1% of the prior tail. This can be calculated in R as:

qgamma(0.01, 20, 1, lower.tail=TRUE)

which gives 11.08213.

./savage-dickey-gamma.py 11.08213 "W:1->3:Adig->Agre" 20 1 r1/baobab-mscm.mcmc.txt
Cutoff: 11.08213
Prior: Gamma(20.0,1.0)
Parameter: W:1->3:Adig->Agre
MCMC file: r4/baobab-mscm-m5.mcmc.txt

Prior: 0.009999994418491292
Posterior: 0.9999
Bayes factor: 0.010000994517943087

If we use a smaller threshold for the null set for the prior lower tail of 0.1%, we get:

./savage-dickey-gamma.py 8.958213 "W:1->3:Adig->Agre" 20 1 r1/baobab-mscm.mcmc.txt
Cutoff: 8.958213
Prior: Gamma(20.0,1.0)
Parameter: W:1->3:Adig->Agre
MCMC file: r4/baobab-mscm-m5.mcmc.txt

Prior: 0.0009999996499439417
Posterior: 0.9995
Bayes factor: 0.0010004998998938885

Similarly, we can test for gene flow in the other direction, Agre $\rightarrow$ Adig:

./savage-dickey-gamma.py 11.08213 "W:3->1:Agre->Adig" 20 1 r1/baobab-mscm.mcmc.txt 
Cutoff: 11.08213
Prior: Gamma(20.0,1.0)
Parameter: W:3->1:Agre->Adig
MCMC file: r4/baobab-mscm-m5.mcmc.txt

Prior: 0.009999994418491292
Posterior: 1.0
Bayes factor: 0.009999994418491292
./savage-dickey-gamma.py 8.958213 "W:3->1:Agre->Adig" 20 1 r1/baobab-mscm.mcmc.txt 
Cutoff: 8.958213
Prior: Gamma(20.0,1.0)
Parameter: W:3->1:Agre->Adig
MCMC file: r4/baobab-mscm-m5.mcmc.txt

Prior: 0.0009999996499439417
Posterior: 1.0
Bayes factor: 0.0009999996499439417

Q: Notice the posterior means of $w$ are similar to the prior mean. How robust are these estimates to the prior specification?

⚠️ **GitHub.com Fallback** ⚠️