Using tools plink programs - pierrefaux/tools-plink GitHub Wiki

This page contains command-line use for the programs listed in tools-page and useful details.

  1. cutHits
  2. partialR2-plink
  3. archaicMapping

cutHits

Use

To merge summary statistics files from various GWAS into one

Rationale

Defines hits (blocks of SNPs below significance threshold for at least one GWAS), assigns it to the most significant GWAS in that region and returns:

  1. the list of blocks in block_details
  2. a file with summary stats for the most significant GWAS in each block and the first GWAS in list elsewhere in pvalue_accross_measurements
  3. the false-discovery rate (FDR) based on the total number of tests (using Benjamini-Hochberg's procedure) The hit definition proceeds by chromosome, as follows:
    • Find the most significant hit
    • Explore the flanking region at a user-defined pace (increment, in bp, e.g. 50000) until no genome-wide suggestive hit is found
    • Assign that hit as a block of SNPs significantly associated in the i-th GWAS (i.e. where the hit is the most significant
    • Mask values in that hit and search next most significant hit (until no genome-wide suggestive hit is found)

Usage details

  1. Set up a 4-columns file all-sum-stats with the list of summary statistics from various GWASes:

    • $1 = GWAS number (integer)
    • $2 = chromosome number (integer)
    • $3 = physical position (integer)
    • $4 = p-value (real)
  2. Run the following command (#GWASes= the number of GWASes):

    cutHits[all-sum-stats] [increment] [#GWASes]

Outputs

  • blocks_details: list of hits (one per row) and details of the blocks (more info coming soon)
  • pvalue_accross_measurements: merge of all GWASes

Current Limitations

  • Physical positions must be on the same map for all GWASes
  • Cannot handle more than 23 chromosomes
  • Currently in-use threshold is 1E-5 (genome-wide suggestive)

partialR2-plink

Use

To compute variance explained by each SNP in a Plink file

Rationale

Given the phenotype, covariates and PLINK (binary) files inputted to a GWAS (using Plink with function --linear), partialR2-plink computes the phenotype variance explained by each SNP in the Plink files, accounting for covariates

Usage details

Run the following command : partialR2-plink [Plink-prefix] [phenotype-file] [covariates-file]

Outputs

  • partialR2.txt with columns:
    • $1 = chromosome
    • $2 = physical position
    • $3 = SNP id
    • $4 = #samples
    • $5 = partial R2 for that SNP

Current Limitations

  • The model is linear
  • Sex must not be included in covariates file; it is retrieved from fam file (the program will likely crash or return NA's if sex is included)
  • The model always includes sex as a covariate
  • For a given SNP, the phenotype/genotype are corrected for covariates passed through covariates files, hence it is not correcting for other SNPs (if you wish to do so, please include these SNPs genotypes as extra covariates)
  • Phenotypes, covariates and Plink's FAM files must contain the same number of samples and in the same order
  • Phenotypes and covariates files must have been cleaned of missing values beforehand
  • Phenotypes, covariates and fam file should not have headers

archaicMapping

Use

To test association between a given trait and the modern/archaic origin of an allele

(Still in development!)

Rationale

Takes as input a list of introgression tracts defined per haplotype, recodes each SNP in Plink files with the number of archaic alleles they carry (0, 1 or 2) and test a phenotype for association with the so-recoded genotypes.

Usage details

Run the following command : archaicMapping [hmm-input-file] [hmm-output-file] [chr-number]

Outputs

Current Limitations

(Still in development!)