Variation and allele frequency spectrum - AndersenLab/Genetic-Analysis GitHub Wiki

Lecture 11

The draft of the human genome was announced in 2000. Genetic samples from nine diverse humans were used to compile the "human genome".

This one genome is known as a "reference genome" to which each individual human genome sequence can be compared. This comparison results in a list of variants between the "reference" and the sample you are testing. However, many classes of variants are found when comparing an individual genome to the reference genome - it is not just single nucleotide variants (SNVs).

Types of variation:

  • Rare (less than 1% of the population) this number will change as more people are genotyped
  • Common (more than 5% of the population)
  • Intermediate (1-5% of the population)

Although we cannot ethically generate knock-outs of genes in humans, each of us contains more than 100 loss-of-function rare variants!

What can we learn by studying genetic variation in the human population?

  1. Studying the genetic variants shared among individuals allows us to identify ancestral lineage!

Homo sapiens bred with Homo neanderthalis and Denisovans at certain points after out-of-Africa migration. Because of these inter-breeding events, we each have some proportion of our genome that is not of Homo sapiens origin.

Because the "birthplace of man" was in Africa, African populations have had the most time to accumulate variation in their DNA and therefore African populations have large amounts of genetic diversity.

With each migration over history, a small subset of individuals populated a new location. These "bottlenecks" result in a decrease in diversity among the individuals in the new population. Over time, new mutations will accumulate and these variants will be "private" to individuals in that migrated population (people in the initial group will not have these same new mutations).

  1. We can use genetic diversity to predict phenotypes like disease!

By finding genetic variants shared by individuals that are affected by the same disease, we can identify at-risk individuals based on their genotype.

The frequency of disease-causing alleles within the population as well as the effect that those alleles have on a disease phenotype changes the requirements of the number of individuals you need to sequence to identify the causal variant.

Mendelian loci have a large effect on phenotype and are very rare in the population.

Common disease common variant loci have a large effect on phenotype and are common in the population.

Impenetrable loci have a small effect on phenotype and are very rare in the population.

Infinitesimal loci have a small effect on phenotype and are common in the population.

Goldilocks loci have a moderate effect on phenotype and occur at a low frequency in the population.

We can use family pedigrees, trios (mother, father, child groups), or large populations of individuals to try to find these disease-causing variants.

Further Resources: