Hardy Weinberg Equilibrium (HWE) - UMEcolGenetics/PawPawPulation-Genetics GitHub Wiki

What is Hardy-Weinberg Equilibrium?

The Hardy-Weinberg Equilibrium (HWE) is fundamental to populations genetics, and states that in the absence of influence by outside forces, genotype frequencies in a population will remain unchanged between generations1. HWE is best explained in the context of a biallelic locus where the frequencies of two alleles 'A' and 'a' correspond to p and q. In the case of this locus, there are three possible genotypes AA, Aa, and aa 2. According to the HWE, the frequencies of AA, Aa, and aa (represented as p2+2pq+q2=1) should not change in a population without the influence of outside factors. Specifically, HWE makes five assumptions about a population: 3,4

  1. no mutation
  2. no migration
  3. no selection
  4. random mating
  5. infinite population size

When one of the above assumptions is violated, deviations from Hardy–Weinberg equilibrium will be observed 3,4.

This principle has become an instrumental part of population genetics and is used to identify genetic structure in populations and instances of non-random mating.

Heterozygosity and the Fixation Index

Information on expected vs. observed heterozygosity will not be discussed in this section but is discussed in the Expected Heterozygosity (He) tutorial.

The Fixation Index (F), also known as the inbreeding coefficient, is a value that ranges from -1 to +1. Values close to zero are expected in systems were random mating is occurring, positive values are indicative of inbreeding or undetected null alleles, and negative values are indicative of selection for heterozygotes (i.e., negative assortative mating)5.

(Figure 1)

F-statistics (FIS, FST, and FIT)

Note: when calculating F-statistics, you will be using mean values for Ho and He: Ho = Observed heterozygosity averaged across subpopulations; He = Expected heterozygosity averaged across subpopulations; HT = Total expected heterozygosity (calculated as if all the subpopulations were pooled)

  • FIS is the inbreeding coefficient within individuals in a subpopulation. This statistic is useful for identifying reductions in heterozygosity in individuals as a result of inbreeding occurring within a subpopulation5,6.

(Figure 2)

  • FST is the inbreeding coefficient within subpopulations relative to the total metapopulation. FST ranges from 0 to 1, with values of zero indicating that subpopulations are identical and likely freely breeding amongst one another 56. Values of one indicate that subpopulations share no alleles and there is likely some sort of barrier to geneflow.

(Figure 3)

  • FIT is the inbreeding coefficient within individuals relative to the total metapopulation. This statistic takes into account patterns of nonrandom mating that my be occurring within subpopulations along with the genetic differentiation among subpopulations5,6.

(Figure 4)

Computing HWE and F-statistics using GenAlEx

Step 1: File Formatting

Hardy-Weinberg tests and F-statistics can be performed in GenAlEx via Excel 5,7. In order to do this, your data must be properly formatted. The first line of your data frame should list (in this order): number of loci, number of individuals, number of populations, and then individual counts for each population.

If these measures are not correctly specified, your results will be inaccurate, so it is important to take care when formatting your file for GenAlEx.

(Figure 5)

The second line of your file should start with a title for your dataframe. After two empty spaces, you can list the population names here, however this is not necessary for this file since there is a column for Population.

Next, you will create a column for Individuals, Populations, and each of your loci. Since these data were generated using codominant markers, you will leave a space between your loci names to indicate that there are two columns corresponding to each locus. An example of correct Excel file format can be found here and is named 'combinedATRILOBAGenalex.xlsx'.

Step 2: Frequency Based Analyses

Allele Frequency

Under the tab at the top of Excel under the GenAlEx tab, there will be an option for Frequency based statistics. Select this option, then select the option for Allele Frequency.

(Figure 6)

Verify that the number of alleles, sample size, and number of populations are specified correctly before proceeding.

Since we are working with codominant markers, be sure that option is selected. Next, select frequency by population, frequency by locus, Nei’s genetic distance, and pairwise FST.

(Figure 7)

This will give you allele frequencies for each of your populations and loci, as well as calculate Nei’s genetic distance, and pairwise Fst for all of your populations. If you are interested in determining if any populations have unique alleles, select the option for Private Alleles List.

(Figure 8)

HWE

Next, to test if your populations are in HWE, select Frequency Based and Hardy Weinberg Disequilibrium.

(Figure 9)

This option will calculate Chi-squared values and significance values independently for each locus present in your populations.

(Figure 10)

This option also allows for graphs to be generated showing observed versus expected genotype counts.

(Figure 11)

F-statistics

To calculate F-statistics for your data, select Frequency Based, Frequency, and then select the box for Het, Fstat && Poly by Pop.

(Figure 12)

This option generates data for: Sample Size, No. Alleles, No. Effective Alleles, Information Index, Observed Heterozygosity, Expected and Unbiased Expected Heterozygosity, and Fixation Index by locus and by population.

(Figure 13)

Interpreting Results

The above statistics are useful for identifying deviations from HWE and identifying populations where ecological forces may be impacting the genetic make-up of your sampled populations. This is extremely useful when conducting population genetic studies as it can show you how genetic diversity is spread across your sampling range and identify populations with lower and higher allelic diversity. These tests are a good place to start when analyzing population genetic data and can help inform which other tests may be useful to apply to your dataset.

References

[1]: Edwards, A.W.F. (2008). G. H. Hardy (1908) and Hardy–Weinberg Equilibrium. Genetics, 179: 1143–1150. https://doi.org/10.1534/genetics.104.92940.

[2]: Graffelman, J., Jain, D., and Weir, B. (2017). A genome-wide study of Hardy–Weinberg equilibrium with next generation sequence data. Hum. Genet., 136: 727–741. https://doi.org/10.1007/s00439-017-1786-7.

[3]: Hardy, G.H. (1908). Mendelian Proportions in a Mixed Population. Science, 28(706): 49–50.

[4]: Weinberg, W. (1908). Über den Nachweis der Vererbung beim Menschen. Jahreshefte des Vereins für vaterländische Naturkunde in Württemberg, 64: 368–382.

[5]: Peakall, R. & Smouse, P.E. (2012). GenAlEx 6.5: genetic analysis in Excel. Population genetic software for teaching and research--an update. Bioinformatics (Oxford, England), 28(19), 2537–2539. https://doi.org/10.1093/bioinformatics/bts460.

[6]: Wright, S. (1965). The Interpretation of Population Structure by F-Statistics with Special Regard to Systems of Mating. Evolution, 19(3): 395–420. https://doi.org/10.2307/2406450.

[7]: Peakall, R. & Smouse, P.E. (2006). GENALEX 6: genetic analysis in Excel. Population genetic software for teaching and research. Molecular Ecology Notes, 6: 288-295. https://doi.org/10.1111/j.1471-8286.2005.01155.x.

Additional Resources

Nei M. (1973). Analysis of gene diversity in subdivided populations. Proc. Natl. Acad. Sci. USA, 70: 3321–3323.

Nei M., Roychoudhury, A.K. (1974). Sampling variances of heterozygosity and genetic distance. Genetics, 76: 379–390.

⚠️ **GitHub.com Fallback** ⚠️