Concatenation vs. Species Trees - Pas-Kapli/CoME-Tutorials GitHub Wiki
In concatenation, the underlying assumption is that all loci reflect the same speciation history. However, that's not always the truth. Often gene trees reflect alternative evolutionary scenarios due to processes such as incomplete lineage sorting (ILS), gene flow or paralog sampling. Under such circumstances, concatenation approaches are not suitable and are prone to producing erroneous results (e.g. in the anomaly zone). The methods that implement the multi-species coalescent model accommodate gene tree incongruence due to ILS.
Schematic of the concatenation and coalescent paradigms in phylogenetics.
At the top is depicted a multilocus data set consisting of five species (A–E) and four genes (1–4). On the left is indicated the classic supermatrix approach, in which all genes are concatenated to produce a single supergene, which is then subjected to phylogenetic analysis by classical or updated traditional algorithms, like the ones we used so far, i.e., RAxML, IQTree and MrBayes. Although the resulting tree at the lower left is in truth a gene tree, it is often called a species tree or phylogeny because it is the result of analysis of a complete data set.
In the centre is depicted a class of species tree (coalescent) methods in which both gene trees and species trees are estimated concurrently according to multilocus sequence data, priors, and a multispecies coalescent likelihood model. Examples of algorithms estimating species trees in this way include BPP that we will use in this tutorial and *BEAST.
On the right are depicted so-called two-step species tree methods, in which gene trees are first estimated using classical approaches and then used as input data to estimate a species tree using algorithms such as Astral that we will use in this tutorial.
Figure and caption (edited) from Liu et al., 2015