Introgression model (MSC‐I) - bpp/bpp-tutorial-geneflow GitHub Wiki

The multispecies coalescent with introgression (MSC-I) model allows episodic or pulse introgression between species or populations. At the moment only the A00 analysis is available. In other words, the user has to specify the number of introgression events, their direction, and the populations involved. The program will then estimate the parameters in the MSC-I model using MCMC. We are working on implementing MCMC moves that change the MSC-I model.

Four types of MSC-I models that are implemented are illustrated in the figure below.

  • In model (a), two parental species SH and TH merge to form a hybrid species H, at time τH
  • In model (b), there is introgression from species RSA to species THC at time τH = τS, with introgression probability φ.
  • In (c), species RSA and RTB come into contact to form hybrid species H at time τSHT, which evolves into species C, while the two parent species become A and B.
  • In (d), bidirectional introgression occurs between species RXA and RYB at time τX = τY, with introgression probabilities φX and φY.

The MSC-I model has three types of parameters:

  • species divergence and hybridization times (τs),
  • population size parameters (θs), and
  • introgression probabilities (φs).

The following illustration shows an example of an MSC-I model with one embedded gene tree.

Specifying an MSC-I model

Even with a single introgression event, constructing an extended newick format for the MSC-I model can be tedious. BPP provides a simple tool, --msci-create, for specifying the introgression events on a binary species tree using source and target branches, and prints out the extended newick notation for the MSC-I model. The command operates by reading a text file that defines the model's specifications, including:

  • A binary rooted species tree with labels on its internal nodes.
  • A list of introgression events specified on the tree's edges.
  • Details regarding the presence of tau parameters at the source introgression node to differentiate among models.

For more information, refer to the --msci-create documentation.

bpp --msci-create msci.txt

The following msci.txt provides the instructions for constructing model (a).

tree (A,(B,C));
define T as B,C
define R as A,B
hybridization R A, T C as S H tau=yes,yes phi=0.10

Another way of producing model (a) is to specify the inner node labels directly on the species tree instead of using the define directives:

tree (A,(B,C)T)R;
hybridization R A, T C as S H tau=yes,yes phi=0.10

We can construct model (b) using the following definition:

tree (A,(B,C)T)R;
hybridization R A, T C as S H tau=no,yes phi=0.10

For model (c):

tree (A,(B,C)T)R;
hybridization R A, T C as S H tau=no,no phi=0.4

For model (d):

tree (A,B)R;
bidirection A R, B R as X Y phi=0.1,0.2

Below you can see the newick formats for the four models:

Model A: ((H[&phi=0.100000,tau-parent=yes],A)S, (B,(C)H[&phi=0.900000,tau-parent=yes])T)R;
Model B: ((H[&phi=0.100000,tau-parent=no],A)S, (B,(C)H[&phi=0.900000,tau-parent=yes])T)R;
Model C: ((H[&phi=0.400000,tau-parent=no],A)S, (B,(C)H[&phi=0.600000,tau-parent=no])T)R;
Model D: ((A,Y[&phi=0.200000])X, (B,X[&phi=0.100000])Y)R;

If we remove the (necessary) annotation of hybrid nodes, we can get a simpler newick format which you might see some patterns of how the introgressions are represented:

Model A: ((H,A)S, (B,(C)H)T)R;
Model B: ((H,A)S, (B,(C)H)T)R;
Model C: ((H,A)S, (B,(C)H)T)R;
Model D: ((A,Y)X, (B,X)Y)R;

Alternative way to specify MSCI models (example)

The idea is to re-write the MSC-I model as a "binary tree" so that each edge in the MSC-I model appears once in its binary form. Nodes corresponding to hybridization or introgression events are end-points for two edges and thus appear twice in the binary tree: once as a binary node, and once as an unary node. The newick notation of the "binary tree" corresponds to the MSC-I model. We need to use attributes like tau-parent=X to define the MSC-I model.

The newick notation for the above example is: (((Arub,z)y,((Amad)z,Agra)e)d,Adig)c;

To specify the MSC-I model we use the tau-parent attributes: (((Arub,z[tau-parent=yes])y,((Amad)z[tau-parent=yes],Agra)e)d,Adig)c;

BPP control file

We use the same control file as for the A00 MSC analyses with the following two changes:

  1. We specify the MSC-I model instead of a binary tree for the species&tree tag.

  2. We introduce a new keyword phiprior = a b which specifies a Beta(a,b) prior for the introgression probability parameter.

Demos

References

  • Flouri T., Jiao X., Rannala B., Yang Z. (2020) A Bayesian Implementation of the Multispecies Coalescent Model with Introgression for Phylogenomic Analysis. Molecular Biology and Evolution, 37(4):1211-1223. doi:10.1093/molbev/msz296
⚠️ **GitHub.com Fallback** ⚠️