Output Files Species Tree Inference - Pas-Kapli/bpp-tutorial GitHub Wiki
Running BPP
The following command executes BPP with the A01.bpp.ctl
control file.
$ bpp --cfile A01.bpp.ctl
Outputs files:
1. Posterior sample of trees.
For each tree, the corresponding taus are also printed while the thetas are only printed if the parameter is sampled. This output is usually named "mcmc.txt", however, the name is arbitrary and can be changed by the user.
For the brown frog example the mcmc.txt would look like this:
((K: 0.001502, C: 0.001502): 0.000303, (L: 0.001140, H: 0.001140): 0.000665);
(((K: 0.000144, C: 0.000144): 0.000082, L: 0.000225): 0.000619, H: 0.000844);
(((K: 0.000183, C: 0.000183): 0.000116, L: 0.000299): 0.000822, H: 0.001122);
(((K: 0.000239, C: 0.000239): 0.000151, L: 0.000391): 0.000675, H: 0.001066);
(((K: 0.000216, C: 0.000216): 0.000175, L: 0.000391): 0.000675, H: 0.001066);
(((K: 0.000216, C: 0.000216): 0.000175, L: 0.000391): 0.000675, H: 0.001066);
(((K: 0.000216, C: 0.000216): 0.000175, L: 0.000391): 0.000649, H: 0.001040);
(((K: 0.000216, C: 0.000216): 0.000175, L: 0.000391): 0.000649, H: 0.001040);
(((K: 0.000412, C: 0.000412): 0.000170, L: 0.000582): 0.000359, H: 0.000941);
(((K: 0.000157, C: 0.000157): 0.000244, L: 0.000401): 0.000552, H: 0.000952);
.
.
.
2. General output and summary file.
This file is often named "out.txt" but as before the name is arbitrary. In this file is stored the information printed in the screen after executing the program.
Before the MCMC sampling the program prints in the "out.txt" some information for the input files, i.e.,
i) the alignment site patterns per locus and their frequencies, and
ii) the information given at the species&tree
parameter in the control file.
During the MCMC sampling
iii) Progress of the analysis, performance traits and current estimates of some parameters are printed that can help in evaluating the efficiency of the run
At the end of the MCMC sampling
iv) the program prints the frequency of the posterior trees and of individual splits
v) Majority-rule consensus tree and the most frequent tree with posterior probabilities for each node.
Species in order:
1. K
2. C
3. L
4. H
(A) Best trees in the sample (15 distinct trees in all)
34502 0.34502 0.34502 (((C, K), L), H);
18432 0.18432 0.52933 (((C, L), K), H);
8554 0.08554 0.61487 ((C, (K, L)), H);
8418 0.08418 0.69905 (((C, K), H), L);
6512 0.06512 0.76417 ((C, (H, K)), L);
4586 0.04586 0.81003 ((C, L), (H, K));
4088 0.04088 0.85091 (((C, L), H), K);
3946 0.03946 0.89037 (((C, H), K), L);
2913 0.02913 0.91950 (((C, H), L), K);
2477 0.02477 0.94427 ((C, K), (H, L));
1794 0.01794 0.96221 (C, ((H, K), L));
1736 0.01736 0.97957 ((C, (H, L)), K);
830 0.00830 0.98787 ((C, H), (K, L));
631 0.00631 0.99418 (C, ((H, L), K));
582 0.00582 1.00000 (C, (H, (K, L)));
(B) Best splits in the sample of trees (10 splits in all)
61488 0.614874 1110
45397 0.453965 1100
27106 0.271057 0110
18876 0.188758 1101
12892 0.128919 1001
9966 0.099659 1010
8737 0.087369 0111
7689 0.076889 0101
4844 0.048440 0011
3007 0.030070 1011
(C) Majority-rule consensus tree
((K, C, L) #0.614874, H);
(D) Best tree (or trees from the mastertree file) with support values
(((C, K) #0.453965, L) #0.614874, H); [P = 0.345017]
To visualise the best or majority-rule consensus tree, copy and paste it in a txt file. Then open it with seaview or Figtree.
To assess the reliability of the results we should repeat the analysis at least once.
Next steps:
- Re-run bpp a couple of times, try increasing the
nsample
, thesamplefreq
and theburnin
to see if the posterior of the best tree increases.
!!!Before re-runing bpp store your output files in a folder otherwise by rerunning you will overwrite them. Example commands:
mkdir run1
cp A01.bpp.ctl bpp_seqfile.txt Imap.txt run1/
mv mcmc.txt out.txt SeedUsed run1/
- Run Astral for comparison:
First, we need to infer a gene-tree per individual locus and write them all in a single file:
cd individual_loci
for i in locus*; do raxml-ng --msa $i --model JC --threads 1; done
cat *bestTree > ml_trees.tre
As we saw earlier, in this dataset we have multiple samples per species. In this case Astral also requires a mapping file of the samples to the corresponding species, the format of the file looks like this:
species_name:individual_1,individual_2,...
species_name:individual_1,individual_2,...
Download the astral map file in your folder:
wget https://raw.githubusercontent.com/Pas-Kapli/bpp-tutorial/master/aux/astral_map.txt
Finally, run Astral:
java -jar ~/Share/Astral/astral.5.7.8.jar -i ml_trees.tre -a astral_map.txt -o astral_tree.tre