Migration model (MSC‐M) - bpp/bpp-tutorial-geneflow GitHub Wiki

In contrast to the MSC-I model, the MSC-M model, also known as isolation-with-migration (IM) model, allows for continuous migration between populations or species. The following figure illustrates the differences between the MSC-I and the MSC-M model in terms of model specification, parameters and the embedded gene trees.

From left to right: (a) MSC-I model, (b) MSC-M model, (c) MSC-I model from (a) with an embedded gene tree, (d) MSC-M model from (b) with embedded gene tree.

BPP implements two flavours of the MSC-M model:

  1. A basic MSC-M model with gamma priors on migration rates
  2. A model of variable migration rates among loci.

We use the same control file as for the A00 MSC model. A binary species tree is specified in the newick format, using labels for internal nodes, and those labels are used later in the control file to identify source and target populations involved in migration. We specify migrations on the species tree using the keywords migration and wprior (used to be migprior in BPP version 4.7 and earlier).

MSC-M model specification

Suppose we have the following balanced species tree:

species&tree = 4 A B C D
                 2 2 2 2
                 ((A,B)S, (C,D)T)R;

migration = 2
            A C
            S C

wprior = 2 1

The species&tree specifies the binary species tree. Here S is the AB common ancestor, T is the CD common ancestor, while R is the ABCD ancestor. Not all internal nodes need to be labeled but those involved in migration have to be. The migration line indicates that there are two directional migration events, from the source population $A$ to the target population $C$, and from $S$ to $C$, with migration rates $\varpi_{AC}$ and $\varpi_{SC}$.

Here, the migration rate $\varpi_{AC} = m_{AC} / \mu = 4 M_{AC} / \theta_C$ is the mutation-scaled migration rate where $\mu$ is the mutation rate per site per generation, $M_{AC} = m_{AC} N_C$ is the expected number of individuals in population $C$ that have migrated from population $A$ per generation under the real-world view with time running forward, and $m_{AC}$ is the proportion of individuals in population $C$ that have migrated from population $A$.

The keyword wprior specifies a default gamma prior for all migration rates. Here, both $\varpi_{AC}$ and $\varpi_{SC}$ are assigned the gamma prior $G(2,1)$, with mean $2/1=2$.

Another possibility is to specify a gamma prior for each migration rate when the migration event is identified using the source and target populations, and this specification takes precedence. For example, if we specify

wprior = 2 1
migration = 2
            A C 2 0.5
            S C

the priors are $\varpi_{AC} \sim G(2,0.5)$, with mean 4, and $\varpi_{SC} \sim G(2,1)$ from the default prior specified by wprior.

Model of variable migration rates among loci

In this model, the migration rate $\varpi_i$ at each locus $i$ varies according to a gamma distribution $\varpi_i \sim G(\alpha_\varpi,\alpha_\varpi / \overline{\varpi})$, with shape parameter $\alpha_\varpi$ while the mean rate $\overline{\varpi}$ is assigned the gamma prior $G(\alpha,\beta)$. Here, $\alpha_\varpi$ characterizes the variation of $\varpi_i$ among loci, with a small $\alpha_\varpi$ (say 0.5 or 1) meaning highly variable rates among loci while a large $\alpha_\varpi$ meaning nearly constant migration rates among loci, with $\alpha_\varpi = \inf$ meaning that all loci have the same rate.

   wprior = 2 1
migration = 2
            A C 2 0.5 5
            S C         

In the above example, $\varpi_{SC}$ is applied to all loci with the default prior $G(2,1)$, but $\varpi_{AC}$ varies among loci according to the shape parameter $\alpha_\varpi = 5$, and the mean rate for all loci $\overline{\varpi}_{AC}$ is assigned the gamma prior $G(2,0.5)$.

Demos

References

  • Flouri T., Jiao X., Huang J., Rannala B., Yang Z. (2023) Efficient Bayesian inference under the multispecies coalescent with migration. Proceedings of the National Academy of Sciences, 120(44):e2310708120. doi:10.1073/pnas.2310708120