Nucleotide diversity with GPAT - zeeev/vcflib GitHub Wiki
This method calculates pi and EHH for bi-allelic SNPs. The result is dependent on window size. With larger windows fewer haplotypes are shared resulting in higher diversity. Choose a window size carefully.
Usage statement for sequenceDiversity:
INFO: help
INFO: description:
The sequenceDiversity program calculates two popular metrics of haplotype diversity: pi and
extended haplotype homozygoisty (eHH). Pi is calculated using the Nei and Li 1979 formulation.
eHH a convenient way to think about haplotype diversity. When eHH = 0 all haplotypes in the window
are unique and when eHH = 1 all haplotypes in the window are identical. The window size is 20 SNPs.
Output : 5 columns:
1. seqid
2. start of window
3. end of window
4. pi
5. eHH
INFO: usage: sequenceDiversity --target 0,1,2,3,4,5,6,7 --file my.vcf
INFO: required: t,target -- argument: a zero base comma seperated list of target individuals corrisponding to VCF columns
INFO: required: f,file -- argument: a properly formatted phased VCF file
INFO: required: y,type -- argument: type of genotype likelihood: PL, GL or GP
INFO: optional; r,region -- argumetn: a tabix compliant region : "seqid:0-100" or "seqid"
INFO: version 1.1.0 ; date: April 2014 ; author: Zev Kronenberg; email : [email protected]
Running provided example:
bin/sequenceDiversity --file samples/scaffold612.phased.vcf.gz --type GP --target 1,20,25,29,30,38,43,46 > pi-ehh.scaffold612"