Linkage disequilibrium and pop. structure - AndersenLab/Genetic-Analysis GitHub Wiki
Lecture 14
Difference between Linkage Mapping and Association Mapping:
- Linkage mapping uses families to study rare disease-causing alleles that have a large effect on the phenotype of interest.
- Association mapping uses a population of individuals to study common disease-causing alleles that generally have a small effect on the phenotype of interest.
- In aggregate, multiple common, small-effect loci found through association mapping can have a high effect on the phenotype, but each individual locus has a small-effect.
- Whereas in linkage mapping we can follow the disease-causing allele through the generations, in association mapping we only know we have a group of people who have a disease and a group of people without the disease and we can use statistical tests to identify the likelihood that a particular allele is linked to that disease
- For this to work we need a high sample size
Linkage disequilibrium (LD) is the non-random association of alleles at different loci
- If two alleles are in high LD, they are thought to be tightly linked
- LD makes genotyping easier and cheaper - we need only one allele from an LD block to represent the individual's genotype
Calculating haplotype frequencies in a population Let's say we have two linked loci that each have two alleles (A or a and B or b)
- pA = frequency of A in the population (proportion of gametes with A)
- pa = 1 - pA
- pB = frequency of B in the population (proportion of gametes with B)
- pb = 1 - pB
- pAB = frequency of the AB haplotype (proportion of gametes with A and B)
If these two alleles are NOT LINKED (i.e. independently associating) we expect the probability of A and B occurring together would be pA * pB. If pA * pB is not equal to pAB, the two alleles are likely linked (linkage disequilibrium is observed).
We commonly use correlation, r2, to calculate LD. This value ranges from 0 (equilibrium, not linked) to 1 (disequilibrium, perfectly linked)
- r2 = (pAB - pA * pB)2 / (pA * (1 - pA) * pB * (1 - pB))
REMEMBER: When calculating LD, you are counting the frequency of haplotypes not individuals (each diploid individual has two chromosomes, so two haplotypes)
- Count number of haplotypes in population of individuals
- Convert number to frequency of haplotypes (sum of all haplotype frequencies should equal 1!)
- Convert to frequencies of alleles (Remember: p(A) + p(a) = 1 and p(B) + p(b) = 1)
- pA = p(AB) + p(Ab)
- pa = 1 - pA
- pB = p(AB) + p(aB)
- pb = 1 - pB
- Plug and chug!
Thought question: would you choose to do association mapping in African populations (less LD) or Asian populations (more LD)?