Analysis of Molecular Variance (AMOVA) - UMEcolGenetics/PawPawPulation-Genetics GitHub Wiki

Introduction to AMOVA

AMOVA (Analysis of MOlecular VAriance) is a method used to describe population differentiation using data generated via molecular markers1. This tutorial will focus on microsatellite data, however a number of different marker types can be used.

AMOVA is a popular method to use for calculating F-statistics as it makes it possible to test for the presence of hierarchical population structure when your dataset has three or more populations1. Population and subpopulation hierarchy must be known previously, so if populations/other strata are not known before, then a clustering analysis (e.g., STRUCTURE) must be run prior to running the AMOVA2. This is not necessary if running a spatial AMOVA3, however for the analyses run in this tutorial, strata must be set before running.

When running an AMOVA, a matrix of squared Euclidean distances between all pairs of individuals is calculated to determine the within and between-groups sums of squares1,4. For codominant markers, like the microsatellite data used for this tutorial, this is done on a locus by locus approach, where a distance matrix is generated for each locus5.

An AMOVA will compare molecular variance across the different strata (i.e., populations in this case) and look to see if the population means differ from one another. Your null hypothesis would be that the population means for all of the populations in your data set are equal, and your alternative hypothesis would be that at least one mean differs from the others.

AMOVA using the R package poppr

The R package poppr can be used to generate an AMOVA, and requires that a distance matrix be calculated from the data and the data to be divided into different stratifications (e.g., populations or subpopulations). If your data contains any populations with only one individual, these must be removed before uploading your dataset into R.

Step 1: Data formatting/loading required packages

To begin, your data must be properly formatted and saved as a .csv file for import into R. The dataset we used can be found here and is called 'pawpawpartial.csv'.

(Figure 1)

pawpawpartial <- read.csv("pawpaw.csv")

And you need to install and load poppr into your R environment.

install.packages('poppr') 
library (poppr) 

Step 2: Transforming the data into the proper format for poppr

To use poppr your data needs to be a genind or genclone object, or any fstat, structure, genetix, genpop, or genalex formatted file. Here, I will transform the dataset (pawpawFull) into a genind object and name it 'Atriloba'. Two packages are required to use these commands: dplyr and adegenet.

#install.packages("adegenet")	
#install.packages("dplyr")
library(adegenet)
library(dplyr)

# the 'ploidy' argument is set to 2 since this is a diploid species
# the 'sep' argument is '","' since alleles are separated by a comma in each column of the dataset.

Atriloba <- df2genind(pawpawpartial, ploidy = 2, sep = ",")	

Note: ensure at this step that your data properly loaded in information for strata as this will be necessary for running the AMOVA.

Now, your dataset Atriloba should be a genind object instead of a dataframe. To check:

class(Atriloba)

Step 3: AMOVA using poppr

Next, ensure you have the package ade4 installed and loaded.

install.packages('ade4') 
library (ade4) 

The command poppr.amova is used for generating AMOVAs in poppr. First you must specify your dataset that was transformed in step 2, so for this example this would be the Atriloba genind object. You must also set an argument for hierarchy, which for this dataset would be population.

amovafull <- poppr.amova(
    Atriloba,
    hier = ~Pop,
    clonecorrect = FALSE,
    within = TRUE, 
    dist = NULL,
    squared = TRUE,
    freq = TRUE,
    correction = "quasieuclid",
    sep = "_",
    filter = FALSE,
    threshold = 0,
    algorithm = "farthest_neighbor",
    threads = 1L,
    missing = "loci",
    cutoff = 0.05,
    quiet = FALSE,
    method = c("ade4", "pegas"),
    nperm = 0
  )

Note: you may need to adjust the parameters for correction, cutoff, and algorithm depending on your dataset.

Step 4: Interpreting Results

The results of the above AMOVA are shown in Figure 2 and the components of covariance in Figure 3. Note: the recovered value for between samples in this data set is zero, a value cannot be computed for Mean Sq for Between samples.

To exclude missing data, you can use the function missingno in R.

missingno(pop, type = "loci", cutoff = 0.05, quiet = FALSE, freq = FALSE)

Here the cutoff is set for 5% of missing data, however this value may need adjustment based off of your study species.

AMOVA in GenAlEx

Step 1: File Formatting

AMOVA can be performed in GenAlEx via Excel6,7. In order to do this, your data must be properly formatted. The first line of your data frame should list (in this order) number of loci, number of individuals, number of populations, and then individual counts for each population. If these measures are not correctly specified, your results will be inaccurate, so it is important to take care when formatting your file for GenAlEx.

The second line of your file should start with a title for your dataframe. After two empty spaces, you can list the population names here, however this is not necessary for this file since there is a column for Population.

Next you will create a column for Individuals, Populations, and each of your loci. Since these data were generated using codominant markers, you will leave a space between your loci names to indicate that there are two columns corresponding to each locus.

Note: Any populations that only contain one individual must be removed before you can proceed with running an AMOVA in GenAlEx.

Step 2: Calculating AMOVA in GenAlEx

Once your data is properly formatted in Excel, you are ready to calculate the AMOVA. First select the option for Distance-Based and then select AMOVA. Ensure that the values for loci, individuals, and population number are correctly specified at this point. If these values do not match your data, check over your file for any typos or missing specifications.

Next you will select options for running your AMOVA. For our dataset, we have codominant-allelic data, so this option should be selected under For-AMOVA-Fst. We are interested in calculating an AMOVA for the total of all of our populations, not for each locus, so Analysis for Total Only is selected. You may need to alter these settings for different data types and if you are interested in different metrics.

Next you will have the option to select how you want your results and the number of permutations. Here I set this to 999 as is the default, but you may want to change this depending on your data.

Below is the output for our analyses which shows that for these data, 70% of the molecular variance is within individuals, 19% is among populations, and 11% is among individuals. The vast majority of the molecular variance in this dataset is seen to be within individuals, which indicates that individuals are harboring diverse genetic variation.

GenAlEx also provides a summary table for the information shown in the above pie chart. Here, Df stands for degrees of freedom and SS is the sum of squared differences of each observation from the mean.

AMOVA in GenoDive

Step 1: File Formatting

The software GenoDive can be used to calculate AMOVAs, and works with a few different file formats. I chose to use the genepop (.txt) since this file format does not take up very much space and tends to open quickly, which is useful when working with large datasets. GenAlEx offers the option to export your data in a number of different file formats, and selecting the option for Genepop will convert your data file to text file format.

Step 2: AMOVA

Once you have selected and imported your file into GenoDive, select the tab at the top of the screen that says Analysis. The last option under this tab should be for AMOVA, so you should select this option.

Here, I ran a standard AMOVA using Infinite Allele Model. I left the number of permutations set to 999 as per the default for this analysis as well.

Step 3: Output

After selecting Calculate, a new file will be created in GenoDive with your results.

This analysis showed that 73.2% of genetic variation is Within Individual, 7.4% Among Individual, and 19.4% Among Population.

Interpreting Results

Generally speaking, an AMOVA is tell us whether genetic diversity within two pooled populations is significantly different from genetic diversity when each is considered alone1. This analysis uses molecular markers to tell you how much of the genetic diversity in your dataset is due to differences between populations, between samples within populations and/or within samples. If genetic diversity is different when comparing to each population alone, then we would reject the null hypothesis. If it is not, then we fail to reject the null hypothesis.

References

[1]: Excoffier, L., Smouse, P.E., Quattro, J.M. (1992). Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data. Genetics, 131(2): 479-91. https://doi.org/10.1093/genetics/131.2.479.

[2]: Pritchard, J.K., Stephens M., Donnelly P. (2000). Inference of population structure using multilocus genotype data. Genetics, 155(2): 945-59. https://doi.org/10.1093/genetics/155.2.945.

[3]: Dupanloup, I., Schneider S., Excoffier L.A. (2002). Simulated annealing approach to define the genetic structure of populations. Mol Ecol., 11(12): 2571-81. https://doi.org/10.1046/j.1365-294x.2002.01650.x.

[4]: Li, C. (1976). Population Genetics Pacific Grove (CA): Boxwood Press.

[5]: Michalakis, Y., Excoffier L.A. (1996). Generic estimation of population subdivision using distances between alleles with special reference for microsatellite loci. Genetics, 142(3): 1061-1064. https://doi.org/10.1093/genetics/142.3.1061.

[6]: Peakall, R. & Smouse, P.E. (2006). GENALEX 6: genetic analysis in Excel. Population genetic software for teaching and research. Molecular Ecology Notes, 6: 288-295. https://doi.org/10.1111/j.1471-8286.2005.01155.x.

[7]: Peakall, R. & Smouse, P.E. (2012). GenAlEx 6.5: genetic analysis in Excel. Population genetic software for teaching and research--an update. Bioinformatics (Oxford, England), 28(19), 2537–2539. https://doi.org/10.1093/bioinformatics/bts460.

[8]: Mengoni, A., & Bazzicalupo, M. (2002). The statistical treatment of data and the Analysis of MOlecular Variance (AMOVA) in molecular microbial ecology. Annals of Microbiology, 52: 95-101. https://www.semanticscholar.org/paper/The-statistical-treatment-of-data-and-the-Analysis-Mengoni-Bazzicalupo/eb2c43dc6e2bb7b05ca6b171f0592421a5db7ea1.

⚠️ **GitHub.com Fallback** ⚠️