Expected heterozygosity (He) - UMEcolGenetics/PawPawPulation-Genetics GitHub Wiki

What is expected heterozygosity?

Expected Heterozygosity (He or Hexp) is a very important statistic for population genetic analyses which describes the number of heterozygous genotypes that would be expected to be observed under Hardy-Weinberg equilibrium 1. This statistic is used to calculate the gene diversity of a locus and is the probability that a pair of alleles from a population at random are different. He is computed as He = 2pq in a two-allele system, with p and q representing the frequencies of each of the two alleles at a locus.

For systems with more than two alleles at a locus, this measure can be determined by:

(Figure 1)

Where pi is the frequency of the ith of k alleles. Here, p1, p2, p3, etc. may correspond to what you would normally think of as p, q, r, s etc.

To calculated gene diversity over several loci, the following formula may be used:

(Figure 2)

where the first summation is for the lth of m loci1.

When interpreting and comparing observed values for heterozygosity to calculated values for He, it is important to recall that for a population to be in Hardy-Weinberg equilibrium, the following assumptions are made: 2,3

  1. no mutation
  2. no migration
  3. no selection
  4. random mating
  5. infinite population size

Genotype frequencies are expected to remain constant and the population is said to be at Hardy–Weinberg equilibrium when after a generation of random mating there is no change in the values of p2, 2pq, and q2. When one of the above assumptions is violated, deviations from Hardy–Weinberg equilibrium will be observed2,3.

Calculating He using Genepop On The Web

One option for calculating He is using the software Genepop On The Web4,5, which can be accessed at https://genepop.curtin.edu.au/.

In order to utilize this software, your data must be in text file format (.txt), which can be done if needed via the GenAlEx plugin for excel6,7. GenAlEx offers the option to export your data in a number of different file formats, and selecting the option for Genepop will convert your data file to text file format.

Once your data is properly formatted (e.g., genepop.txt), follow the link above to the Genepop website. In order to calculate He, you will select option 5: Allele frequencies, etc. Be sure to select the appropriate ploidy level for your data set, which in the case of our sample dataset would be diploid.

(Figure 3)

Next, you must select how you would like to receive your results. I recommend selecting to have them emailed, as this provides you with a copy of your results for later reference.

(Figure 4)

Finally, you must choose your datafile and select “Submit data”.

(Figure 5)

Note: This option on Genepop also calculates additional statistics, such as allele frequencies and FIS.

Interpreting results

Once you have submitted your data, Genepop will return a file with results calculated by population and locus. This will be done for each of the populations and loci in your data file in the order they appear in your submitted file. The below data is calculated for population BCB at locus 108, and shows that this subpopulation has an excess of heterozygotes at this locus compared to what would be expected if this population was in HWE.

(Figure 6)

References

[1]: Nei M. (1973). Analysis of gene diversity in subdivided populations. Proc. Natl. Acad. Sci. USA, 70: 3321–3323.

[2]: Hardy, G. H. (1908). Mendelian Proportions in a Mixed Population. Science, 28(706): 49–50.

[3]: Weinberg, W. (1908). Über den Nachweis der Vererbung beim Menschen. Jahreshefte des Vereins für vaterländische Naturkunde in Württemberg, 64: 368–382.

[4]: Raymond M. & Rousset F. (1995). GENEPOP (version 1.2): population genetics software for exact tests and ecumenicism. J. Heredity, 86: 248-249.

[5]: Rousset, F. (2008). Genepop'007: a complete reimplementation of the Genepop software for Windows and Linux. Mol. Ecol. Resources, 8: 103-106.

[6]: Peakall, R. & Smouse, P.E. (2006). GENALEX 6: genetic analysis in Excel. Population genetic software for teaching and research. Molecular Ecology Notes, 6: 288-295. https://doi.org/10.1111/j.1471-8286.2005.01155.x.

[7]: Peakall, R. & Smouse, P.E. (2012). GenAlEx 6.5: genetic analysis in Excel. Population genetic software for teaching and research--an update. Bioinformatics (Oxford, England), 28(19), 2537–2539. https://doi.org/10.1093/bioinformatics/bts460.

Additional Resources

Nei M. & Roychoudhury A.K. (1974). Sampling variances of heterozygosity and genetic distance. Genetics, 76: 379–390.

⚠️ **GitHub.com Fallback** ⚠️