Rscripts - WheelerLab/gwasqc_pipeline GitHub Wiki

In addition to the three main shell scripts of this pipeline there exists 6 additional R scripts used for a variety of plotting and data analysis. The main scripts of this pipeline call each of these scripts and have appropriate flags to use them, so there is no need to run any R script independently. However, if one wishes to do so these scripts are outlined here.

AffyToRsid.R

Usage

This script is meant to help translate variant IDs to the rsID format used by the Hapmap files. This process requires a summarystatistics file containing the rsIDs of the SNP as well as the chromosome and locus of the SNP. Creates the file 00rsID_format.bim in the directory

Flags

      --bim
          Full path to the bim file containing the Affymetrix variant IDs to be translated
      --stats
          Full path to the stats file containing the rsIDs and associated chromosomal and locus information.
      --output or -o
          Directory you would like file to be output

CallRateDistribution.R

Usage

The purpose of this script is to generate call rate distribution histograms for validation of missingness filtering. The script produces a single pdf containing histograms comparing the call rate before and after missingness filtering. This script relies on built ins to the main shell script 01Missingness filtering so it has minimal utility outside of the pipeline.

Flags

      -t or --threshold
          The call rate threshold that you have filtered by
      --QCdir
          The directory containing QC files

heterozygosity.R

Usage

The purpose of this script is to plot the distribution of heterozygosity estimates of the samples. Second this script is meant to generate a list of individuals who are designated as outliers by being +/-3 sd away from the mean heterozygosity.

Flags

      --het
          path to the het file generated by plink
      --tag or -t
          naming tag for the outputs
      --outputdir or -o
          directory where you'd like to output

hwe.R

Usage

The purpose of this script is to generate hardy-weinburg equilibrium statistics. This includes a visual representation as will as a stats file.

Flags

      --hwe
          Full path to the plink .hwe file to be analyzed
      -t or --tag
          label for this particular analysis
      -o or --outputdir
          The directory containing QC files

ibd.R

Usage

The purpose of this script is to plot the identity by descent of the sample population.

Flags

      --genome or -g
          The .genome file produced by PLINK containing the IBD calculations
      --outputdir or -o
          The directory where you would like to write your output to

PCA.R

Uses the eigenvector and eigenvalue files produced by plink pca to create skree and pca plots for the data

Usage

Flags

      --fam
          full path to the fam file youd like to use
      --hapmapdir or -h
          directory where all the hapmap files are written
      --outputdir or -o
          directory where you would like to output your plots
      --val
          full path to eigenvalue file
      --vec
          full path to eigenvector file