Calculating linkage disequilibrium with GPAT - zeeev/vcflib GitHub Wiki

Linkage disequilibrium is the co-occurance of alleles on the same haplotype. The LD program calculates the "D" statistic. This metric quantifies the different between the expected haplotype frequencies and the observed haplotype frequencies. Consider two positions in a genome (snpA & snpB) across the population which have the reference allele frequencies of pA and pB. The reference haplotype would be "00" and have a frequency of p00. D is calculated as D = p00 - (pA*pB).

Worked example:

snpA - pA: allele frequency of reference base at snpA - 0.1

snpB - pB: allele frequency of reference base at snpB - 0.5

00 - p0: frequency of seeing "00" or two references bases on the same haplotype: 0.4

D = 0.4 - (0.1*0.5) = 0.35

Usage statement for LD:

INFO: help
INFO: description:
INFO: LD --target 0,1,2,3,4,5,6,7 --background 11,12,13,16,17,19,22 --file my.vcf -e -d -r

INFO: required: t,target     -- argument: a zero base comma seperated list of target individuals corrisponding to VCF columns
INFO: required: b,background -- argument: a zero base comma seperated list of background individuals corrisponding to VCF columns
INFO: required: f,file       -- argument: a properly formatted phased VCF file
INFO: required: y,type       -- argument: type of genotype likelihood: PL, GL or GP
INFO: optional: w,window     -- argument: window size to average LD; default is 1000
INFO: optional: e,external   -- switch: population to calculate LD expectation; default is target
INFO: optional: d,derived    -- switch: which haplotype to count "00" vs "11"; default "00",

INFO: version 1.1.0 ; date: April 2014 ; author: Zev Kronenberg; email : [email protected]

Additional information about options:

  1. e,external: Use the frequency of snpA and snpB in another population (background) to estimate the expectation. This can setting can be used to find differences in linkage between two populations.
  2. d,derived : By default LD counts the reference haplotype "00". There are many instances where the reference base is not correct. the derived option allows the user to switch the reference to "11". This means p00 = p11, pA = (1 -pA), pB = (1 -pB).

###Running the provided example:

bin/LD --target 1,20,25,29,30,38,43,46 --background 2,3,4,5,6,7,21,22,22,23,24,26,26,28,31,32,33,34,35,36,37,39,40,41,42,44,45 --type GP --file samples/scaffold612.phased.vcf   -e -d -w 20 > t.ld.tx