Output files MSci - Pas-Kapli/bpp-tutorial GitHub Wiki

Running BPP

The following command executes BPP with the A00.bpp.ctl control file.

$ bpp --cfile A00.bpp.ctl

Outputs files:

1. Posterior sample of parameters.

In this analysis, the species relationships are fixed therefore the posterior samples stored in the "mcmc.txt" file contain only the theta, tau and phi parameters for the yeast phylogeny.

Below is how the first few lines of the file would look like:

Gen	theta_6R	theta_7H	theta_8C	theta_9B	theta_10A	theta_11D	tau_6R		tau_7H		tau_8C		tau_9B		tau_10A		tau_11D		phi_H		lnL
2	0.025386	0.039072	0.025316	0.014104	0.007243	0.008033	0.089481	0.068105	0.079800	0.060613	0.038491	0.068105	0.345577	-393529.027
4	0.025482	0.033927	0.032226	0.015595	0.007271	0.008063	0.089820	0.069352	0.080102	0.060842	0.038636	0.069352	0.345577	-393538.033
6	0.025482	0.024968	0.022767	0.015595	0.007455	0.008063	0.089820	0.069352	0.079859	0.060842	0.037679	0.069352	0.379608	-393532.595
8	0.017848	0.014851	0.021358	0.015007	0.007506	0.008117	0.090425	0.068810	0.080397	0.062329	0.037933	0.068810	0.379608	-393536.098
10	0.022353	0.016898	0.028849	0.015007	0.007506	0.008117	0.090425	0.068810	0.080397	0.062329	0.037933	0.068810	0.379608	-393554.914
.
.
.

2. General output and summary file.

This file is often named "out.txt" and it contains the information printed in the screen while executing the program.

Before the MCMC sampling the program prints in the "out.txt" some information for the input files, i.e.,

The first part of the output contains information regarding the data read from the input files as we show in the species tree inference example.

During the MCMC sampling

Performance traits and current estimates of some parameters are printed that can help in evaluating the efficiency of the run

At the end of the MCMC sampling

A summary of the posterior sample of the estimated parameters is provided at the end of the output.txt

The most information is the mean and the standard deviation for each of the sampled parameters.

The ESS values help us assess whether the analysis suffers from mixing problems. The larger this number is for each of the parameters the more efficient the MCMC sampling has been (i.e., covering a wider range of possible parameter combinations).

          theta_6R	theta_7H	theta_8C	theta_9B	theta_10A	theta_11D	tau_6R		tau_7H		tau_8C		tau_9B		tau_10A		tau_11D		phi_H		lnL
mean      0.016326	0.019114	0.027967	0.016614	0.009500	0.012635	0.093424	0.066804	0.079418	0.061169	0.037566	0.066804	0.302642	-393524.547195
median    0.016136	0.014790	0.027404	0.016392	0.009413	0.011993	0.093396	0.066851	0.079430	0.061180	0.037570	0.066851	0.302498	-393524.089000
S.D       0.003385	0.014772	0.005271	0.002719	0.001535	0.004188	0.002626	0.001440	0.001064	0.001016	0.000735	0.001440	0.069993	20.239520
min       0.005629	0.002527	0.010902	0.008135	0.004881	0.003198	0.083993	0.059519	0.074746	0.056280	0.033970	0.059519	0.056362	-393608.277000
max       0.036216	0.175607	0.081364	0.033302	0.018394	0.041609	0.104087	0.072244	0.083916	0.065164	0.040459	0.072244	0.615157	-393441.049000
2.5%      0.010326	0.005514	0.019366	0.011879	0.006785	0.006281	0.088339	0.063888	0.077285	0.059142	0.036090	0.063888	0.167412	-393565.186000
97.5%     0.023465	0.060793	0.039880	0.022565	0.012794	0.022560	0.098652	0.069507	0.081472	0.063127	0.038989	0.069507	0.440236	-393486.102000
2.5%HPD   0.009901	0.003558	0.018545	0.011579	0.006613	0.005402	0.088452	0.063972	0.077350	0.059157	0.036139	0.063972	0.167631	-393565.279000
97.5%HPD  0.022904	0.047034	0.038498	0.022074	0.012568	0.021014	0.098744	0.069578	0.081525	0.063137	0.039029	0.069578	0.440419	-393486.210000
ESS*      1436.575	1640.948	5787.539	9752.434	4751.187	1835.736	682.2608	5128.242	4361.526	7605.001	6151.933	5128.242	1580.856	5115.941100
Eff*      0.014366	0.016409	0.057875	0.097524	0.047512	0.018357	0.006823	0.051282	0.043615	0.076050	0.061519	0.051282	0.015809	0.051159

To verify the reliability of the results it would be necessary to repeat the analysis at least one more time.

Next, learn how to run the analysis faster check by using parallel computations