Isolation by Distance (IBD) - UMEcolGenetics/PawPawPulation-Genetics GitHub Wiki

Introduction

So, you want to know if a population exhibits isolation by distance (IBD)?  

In this tutorial, we show how to use a Mantel test to see if there is a significant relationship between population genetic distances and the geographic distances between sampling sites based using the pawpaw example data1. You will learn about several concepts: fixation indices and their use in assessing population genetic differentiation; isolation by distance (IBD) and how to use it as a null hypothesis; and, Mantel tests

We will be using R and the following packages:

Background Information

Population Genetic Differentiation

Genetic differentiation between populations can be evaluated using fixation indices. Commonly used in studies of population genetics and genetic structure, fixation indices arose as an effort to quantify the inbreeding effect of population subdivision2,3.

Wright’s original FST index assessed differentiation at a single genetic locus, based on the formula (HT – HS)/HT, where HT is the average heterozygosity across the whole population of interest and HS is the average heterozygosity within the subpopulation of interest.  Fixation index values range from 0, indicating groups have no genetic divergence, to 1, indicating that groups are completely dissimilar, with fixed alternative alleles in either subpopulation3. Though meaningful interpretation of FST values vary by taxa, values between 0-0.05 may be interpreted indicating as non-negligible but little differentiation, values between 0.05-0.15 may indicate moderate genetic differentiation, 0.15-0.25 great genetic differentiation, and values above 0.25 very great differentiation4. Negative FST values would then arise if average heterozygosity within a subpopulation is higher than within the population as a whole, which are generally advised to be treated as 0 values5,6.  Nei’s analog, referred to as either Nei’s 19737 FST or GST, or Nei’s genetic distance, d, extended Wright’s single-locus index to an analysis across multiple loci by using the weighted average diversity of all loci under consideration. As such, Nei’s genetic distance measures the proportion of genetic diversity across a population that can be found within subpopulations. Later modifications in Nei’s 19878 pairwise FST also allowed for comparisons between subpopulations within a larger population.    

Isolation by Distance (IBD)

As first proposed by Wright in 19439, isolation by distance (or IBD) is a pattern of genetic subdivision that results from limited dispersal across a landscape without selection. Under IBD, the allele frequencies of individuals genotyped at neutral alleles will vary across space, effectively creating population genetic subdivision. However, the differences will be spatially autocorrelated, i.e. the pattern created by IBD is a continuous gradient of genetic difference directly correlated to geographical distance.

IBD is often used as a null hypothesis in studies of population genetic structure and landscape genetics when interested in the ecological causes of genetic subdivision10. Genetic structure patterned by IBD is the result of neutral processes across both the genome and the landscape, which is in contrast to genetic differences among groups caused by adaptation or discrete landscape barriers11. However, both IBD and non-neutral processes can produce population genetic structure simultaneously. In cases when population genetic structure and IBD are both detected, additional data are required to interpret whether neutral or adaptive mechanisms are driving differentiation among groups. 

Mantel Tests

We will evaluate the populations’ genetic data and sampling locations for patterns of isolation by distance using a Mantel test. Mantel tests analyze paired distance matrices for correlation and then use permutations to assess significance12.  

Running the Analysis

We’ll be working in R with a handful of packages. If you’re unfamiliar with R, you may want to start with some beginner R Tutorials.

R packages:

  1. geosphere will allow us to calculate geographical distances between populations based on coordinate data and will output a matrix of these distances.
  2. heirfstat will allow us to calculate genetic distances between populations using fixation indices and will output a matrix of these values.
  3. vegan, will allow us to run a Mantel test to assess for significant correlations between the two distance matrices we created with the previous packages.

To install these packages, you’ll use the following code:

install.packages("geosphere")
install.packages("hierfstat")
install.packages("vegan")

And to call the packages:

library(geosphere)
library(hierfstat)
library(vegan)

Input Data & Formatting

For the steps below, the pawpaw data has been formatted into a csv (comma-separated values) file. You could also use an Excel sheet, similar to pawpaw_hierfstat.xlsx. The first column contains population IDs and the subsequent columns contain the multi-locus genotypes. Here is a snapshot of what it looks like in Excel:

We’re also going to use the geographical coordinates for each population as provided in the main publication and the supplemental data. To start, the first row contains headers, the first column contains population IDs and the following columns contain the latitude and longitude values, respectively, for each population. Here’s a snapshot of what it looks like in Excel:

Make sure your data is organized so populations are in the same sequence from top to bottom in both tables!!

Read in your data

Use R-Studio’s “Import Dataset” feature under the environment tab or... use the following code...:

pawpaw_genoDat <- read.csv(“…”) 
latlong <- read_excel(“…”)

Fill in your own directory paths and filenames!!

Generate Genetic Distance Matrices

Now that your files have been read in, we’ll calculate the genetic distances using fixation indices. First, we’ll convert the data to a data frame:

genoDat <- data.frame(pawpaw_genoDat)

The package hierfstat can calculate both Nei’s estimator or Weir and Cockerham’s estimator. Here we show Nei’s estimator with an option to specify if your organism is diploid or not:

fst.Nei <- pairwise.neifst(genoDat, diploid=TRUE)

The output will be a distance matrix of Nei’s FST, with each population pair arranged with their FST value at the intersecting cell. We can look at this matrix in R using the following:

fst.Nei #displays in the console

or

View(fst.Nei) #displays as a separate table

FYI: The function for calculating Weir and Cockerham’s estimator is similar in terms of arguments and output. he command is simply: pairwise.WCfst(dat, diploid=TRUE)

Generating geographic distance matrices

Next, we’ll calculate the geographical distance between populations. First, we’ll convert the latitude & longitude data into a data frame, and drop the population ID column:

geo <- data.frame(df$Longitude, df$Latitude)

Then we'll calculate the distance. Here I’ve also applied the “Haversine” function to account for the spherical shape of the Earth between coordinates. Often you’ll find the geographical distances are log-transformed to help reduce skewness in non-normal data.

d.geo <- distm(geo, fun = distHaversine)
dist.geo <- as.dist(d.geo)

Assessing correlation between distance matrices using a Mantel test

Now that we have our two distance matrices, we can finally run our Mantel test!

The command to run the Mantel test has a few arguments. The first two are our distance matrices, fit.Nei and dist.geo. These are followed by an option to specify the method— here we’ve chosen “spearman” in order to run it as a non-parametric test. We have also set the number of permutations to 9999 and will have NA values removed (via na.rm = TRUE).

fst_geo = mantel(fst.Nei, dist.geo, method = "spearman", permutations = 9999, na.rm = TRUE)

To see the output of the Mantel test, simply enter the following:

fst_geo

This should display output for the test we just ran. I’ve copied my output below:

Mantel statistic based on Spearman's rank correlation rho 

Call:
	mantel(xdis = fst.Nei, ydis = dist.geo, method = "spearman",      permutations = 9999, na.rm = TRUE) 

Mantel statistic r: 0.08549 
      Significance: 0.0651 

Upper quantiles of permutations (null model):
   90%    95%  97.5%    99% 
0.0719 0.0936 0.1143 0.1355 
Permutation: free
Number of permutations: 9999

Interpretation & Alternatives

The Mantel statistic r is 0.08549 and the significance is 0.0651. This means that we do not have significant IBD based on this test. An interesting test of the skills introduced here would be to run the test again, separating out the anthropogenic and wild populations to run separately.

Another method to test for IBD was demonstrated in the paper, where the authors estimated IBD using a linear regression analysis of GST / (1 − GST) against the log geographic distance13. In this case, the authors used a Mantel test as a test of significance.

References

[1]: Wyatt, G.E., Hamrick, J.L., & Trapnell, D.W. (2021). The role of anthropogenic dispersal in shaping the distribution and genetic composition of a widespread North American tree species. Ecology and evolution, 11(16): 11515–11532. https://doi.org/10.1002/ece3.7944.

[2]: Wright, S. (1921). Systems of mating. I. The biometric relations between parent and offspring. Genetics, 6(2): 111.

[3]: Hartl, D.L., & Clark, A.G. (2007). Principles of Population Genetics (Sinauer, Sunderland, MA).

[4]: Wright, S. (1978). The relation of livestock breeding to theories of evolution. Journal of Animal Science, 46(5): 1192-1200.

[5]: Hedrick, P.W. (2005). A Standardized Genetic Differentiation Measure. Evolution, 59: 1633-1638.

[6]: Meirmans P.G. & Hedrick P.W. (2011). Assessing population structure: FST and related measures. Molecular Ecology Resources, 11: 5-18.

[7]: Nei M. (1973). Analysis of gene diversity in subdivided populations. Proc. Natl. Acad. Sci. USA, 70: 3321–3323.

[8]: Nei, M. (1987). Molecular evolutionary genetics. Columbia university press.

[9]: Wright, S. (1943). Isolation by distance. Genetics, 28(2): 114.

[10]: Storfer, A., Murphy, M.A., Evans, J.S., Goldberg, C.S., Robinson, S., Spear, S.F., Dezzani, R., Delmelle, E., Vierling, L., and Waits, L.P. (2007). Putting the ‘landscape’in landscape genetics. Heredity, 98(3): 128-142.

[11]: Manel, S., Schwartz, M.K., Luikart, G., & Taberlet, P. (2003). Landscape genetics: combining landscape ecology and population genetics. Trends in Ecology & Evolution, 18(4): 189-197.

[12]: Mantel, N. (1967). The detection of disease clustering and a generalized regression approach. Cancer Research, 27(2): 209–220.

[13]: Rousset, F. (1997). Genetic differentiation and estimation of gene flow from F-statistics under isolation by distance. Genetics, 145: 1219– 1228.

⚠️ **GitHub.com Fallback** ⚠️