Linkage disequilibrium and pop. structure - AndersenLab/Genetic-Analysis GitHub Wiki

Lecture 14

Difference between Linkage Mapping and Association Mapping:

  • Linkage mapping uses families to study rare disease-causing alleles that have a large effect on the phenotype of interest.
  • Association mapping uses a population of individuals to study common disease-causing alleles that generally have a small effect on the phenotype of interest.
    • In aggregate, multiple common, small-effect loci found through association mapping can have a high effect on the phenotype, but each individual locus has a small-effect.
    • Whereas in linkage mapping we can follow the disease-causing allele through the generations, in association mapping we only know we have a group of people who have a disease and a group of people without the disease and we can use statistical tests to identify the likelihood that a particular allele is linked to that disease
      • For this to work we need a high sample size

Linkage disequilibrium (LD) is the non-random association of alleles at different loci

  • If two alleles are in high LD, they are thought to be tightly linked
    • LD makes genotyping easier and cheaper - we need only one allele from an LD block to represent the individual's genotype

Calculating haplotype frequencies in a population Let's say we have two linked loci that each have two alleles (A or a and B or b)

  • pA = frequency of A in the population (proportion of gametes with A)
  • pa = 1 - pA
  • pB = frequency of B in the population (proportion of gametes with B)
  • pb = 1 - pB
  • pAB = frequency of the AB haplotype (proportion of gametes with A and B)

If these two alleles are NOT LINKED (i.e. independently associating) we expect the probability of A and B occurring together would be pA * pB. If pA * pB is not equal to pAB, the two alleles are likely linked (linkage disequilibrium is observed).

We commonly use correlation, r2, to calculate LD. This value ranges from 0 (equilibrium, not linked) to 1 (disequilibrium, perfectly linked)

  • r2 = (pAB - pA * pB)2 / (pA * (1 - pA) * pB * (1 - pB))

REMEMBER: When calculating LD, you are counting the frequency of haplotypes not individuals (each diploid individual has two chromosomes, so two haplotypes)

  1. Count number of haplotypes in population of individuals
  2. Convert number to frequency of haplotypes (sum of all haplotype frequencies should equal 1!)
  3. Convert to frequencies of alleles (Remember: p(A) + p(a) = 1 and p(B) + p(b) = 1)
    • pA = p(AB) + p(Ab)
    • pa = 1 - pA
    • pB = p(AB) + p(aB)
    • pb = 1 - pB
  4. Plug and chug!

Thought question: would you choose to do association mapping in African populations (less LD) or Asian populations (more LD)?

⚠️ **GitHub.com Fallback** ⚠️