Output Files Species Tree Inference - Pas-Kapli/bpp-tutorial GitHub Wiki

Running BPP

The following command executes BPP with the A01.bpp.ctl control file.

$ bpp --cfile A01.bpp.ctl

Outputs files:

1. Posterior sample of trees.

For each tree, the corresponding taus are also printed while the thetas are only printed if the parameter is sampled. This output is usually named "mcmc.txt", however, the name is arbitrary and can be changed by the user.

For the brown frog example the mcmc.txt would look like this:

((K: 0.001502, C: 0.001502): 0.000303, (L: 0.001140, H: 0.001140): 0.000665);
(((K: 0.000144, C: 0.000144): 0.000082, L: 0.000225): 0.000619, H: 0.000844);
(((K: 0.000183, C: 0.000183): 0.000116, L: 0.000299): 0.000822, H: 0.001122);
(((K: 0.000239, C: 0.000239): 0.000151, L: 0.000391): 0.000675, H: 0.001066);
(((K: 0.000216, C: 0.000216): 0.000175, L: 0.000391): 0.000675, H: 0.001066);
(((K: 0.000216, C: 0.000216): 0.000175, L: 0.000391): 0.000675, H: 0.001066);
(((K: 0.000216, C: 0.000216): 0.000175, L: 0.000391): 0.000649, H: 0.001040);
(((K: 0.000216, C: 0.000216): 0.000175, L: 0.000391): 0.000649, H: 0.001040);
(((K: 0.000412, C: 0.000412): 0.000170, L: 0.000582): 0.000359, H: 0.000941);
(((K: 0.000157, C: 0.000157): 0.000244, L: 0.000401): 0.000552, H: 0.000952);
.
.
.

2. General output and summary file.

This file is often named "out.txt" but as before the name is arbitrary. In this file is stored the information printed in the screen after executing the program.

Before the MCMC sampling the program prints in the "out.txt" some information for the input files, i.e.,

i) the alignment site patterns per locus and their frequencies, and

ii) the information given at the species&tree parameter in the control file.

During the MCMC sampling

iii) Progress of the analysis, performance traits and current estimates of some parameters are printed that can help in evaluating the efficiency of the run

At the end of the MCMC sampling

iv) the program prints the frequency of the posterior trees and of individual splits

v) Majority-rule consensus tree and the most frequent tree with posterior probabilities for each node.

Species in order:
   1. K
   2. C
   3. L
   4. H

(A) Best trees in the sample (15 distinct trees in all)
    34502  0.34502  0.34502 (((C, K), L), H);
    18432  0.18432  0.52933 (((C, L), K), H);
     8554  0.08554  0.61487 ((C, (K, L)), H);
     8418  0.08418  0.69905 (((C, K), H), L);
     6512  0.06512  0.76417 ((C, (H, K)), L);
     4586  0.04586  0.81003 ((C, L), (H, K));
     4088  0.04088  0.85091 (((C, L), H), K);
     3946  0.03946  0.89037 (((C, H), K), L);
     2913  0.02913  0.91950 (((C, H), L), K);
     2477  0.02477  0.94427 ((C, K), (H, L));
     1794  0.01794  0.96221 (C, ((H, K), L));
     1736  0.01736  0.97957 ((C, (H, L)), K);
      830  0.00830  0.98787 ((C, H), (K, L));
      631  0.00631  0.99418 (C, ((H, L), K));
      582  0.00582  1.00000 (C, (H, (K, L)));

(B) Best splits in the sample of trees (10 splits in all)
 61488 0.614874  1110
 45397 0.453965  1100
 27106 0.271057  0110
 18876 0.188758  1101
 12892 0.128919  1001
  9966 0.099659  1010
  8737 0.087369  0111
  7689 0.076889  0101
  4844 0.048440  0011
  3007 0.030070  1011

(C) Majority-rule consensus tree
((K, C, L) #0.614874, H);

(D) Best tree (or trees from the mastertree file) with support values
(((C, K) #0.453965, L) #0.614874, H);   [P = 0.345017]

To visualise the best or majority-rule consensus tree, copy and paste it in a txt file. Then open it with seaview or Figtree.

To assess the reliability of the results we should repeat the analysis at least once.

Next steps:

  1. Re-run bpp a couple of times, try increasing the nsample, the samplefreq and the burnin to see if the posterior of the best tree increases.

!!!Before re-runing bpp store your output files in a folder otherwise by rerunning you will overwrite them. Example commands:

mkdir run1
cp A01.bpp.ctl bpp_seqfile.txt Imap.txt run1/
mv mcmc.txt out.txt SeedUsed run1/
  1. Run Astral for comparison:

First, we need to infer a gene-tree per individual locus and write them all in a single file:

cd individual_loci
for i in locus*; do raxml-ng --msa $i --model JC --threads 1; done
cat *bestTree > ml_trees.tre

As we saw earlier, in this dataset we have multiple samples per species. In this case Astral also requires a mapping file of the samples to the corresponding species, the format of the file looks like this:

species_name:individual_1,individual_2,...
species_name:individual_1,individual_2,...

Download the astral map file in your folder:

wget https://raw.githubusercontent.com/Pas-Kapli/bpp-tutorial/master/aux/astral_map.txt

Finally, run Astral:

java -jar ~/Share/Astral/astral.5.7.8.jar -i ml_trees.tre -a astral_map.txt -o astral_tree.tre

Next, learn how to estimate model parameters under the MSci model