Species Tree inference with Astral - bpp/bpp-tutorial-geneflow GitHub Wiki

Astral is species tree inference program that takes as input a set of pre-computed unrooted gene-trees. Astral is statistically consistent under the multi-species coalescent model and it is therefore often used as an alternative to the concatenation approach for assessing whether the case under study might be affected by the Anomaly Zone.

Download the data:

mkdir baobap-astral
cd baobap-astral

#Download the data in the new folder
wget https://github.com/bpp/bpp-tutorial-geneflow/raw/main/data/baobap-loci.tar.gz
tar -xvzf baobap-loci.tar.gz
rm baobap-loci.tar.gz

Astral runs in two steps:

Step 1: Estimation of the gene-trees.

We saw how to do that on the first day of the workshop:

# This will take about 2 minutes
for i in locus-*; do iqtree2 -m GTR+G -s $i; done

#Collect all the ML trees in a single file 
cat *treefile > baobap-mltrees.txt 

#Create a new folder
mkdir species-tree

#Move your trees in the new folder
mv baobap-mltrees.txt species-tree
cd species-tree

#Or download the file with the trees in the "species-tree" folder if you prefer:
wget https://raw.githubusercontent.com/bpp/bpp-tutorial-geneflow/main/data/baobap-mltrees.txt

Step 2: Tree inference with astral.

Input files

Astral requires primarily one input file, a simple text file with all the gene-trees in newick format like the one we created above. However, if the dataset contains multiple individuals from the same species it is also helpful to include a "mapping file" with the following format:

species_name [number of individuals] individual_1 individual_2 ...

species_name:individual_1,individual_2,...

Particularly for the baobaps the map file looks like this:

Adig:Adi001,Adi002
Agra:Aga001,Aga002	
Agre:Age001	
Amad:Ama006,Ama018	
Arub:Aru001,Aru127	
Smic:Smi165	

Download it as follows:

wget https://raw.githubusercontent.com/bpp/bpp-tutorial-geneflow/main/data/baobab.Astral.map.txt

Running Astral

Having the map file ("baobab.Astral.map.txt") and the input gene-trees ("baobap-mltrees.txt"), we can now run Astral:

astral -i baobap-mltrees.txt -o baobap-astral.tre -a baobab.Astral.map.txt 2> baobap-astral.log

You can visualize the astral tree on your computer using e.g. seaview or figtree

Astral Output

Newick tree

The output file of Astral is an unrooted newick tree and can be viewed with any tree viewer such as Seaview, Figtree etc.

Branch lengths

The branch lengths in the tree are in coalescent units, i.e., a direct measure of the amount of discordance in the gene trees. As such, they are prone to underestimation because of statistical noise in gene tree estimation. They are sensible only for internal branches and those terminal branches that correspond to species with more than one individuals sampled.

Support values

Branch support values measure the support for a quadripartition (the four clusters around a branch) and not the bipartition, as is commonly done.

Q: is your Astral tree compatible with the tree here

Next: BPP assumptions

Species Tree Inference with Astral | BPP assumptions | BPP control file | Species Tree Inference with BPP | Parameter Estimation with BPP