Rscripts - WheelerLab/gwasqc_pipeline GitHub Wiki
In addition to the three main shell scripts of this pipeline there exists 6 additional R scripts used for a variety of plotting and data analysis. The main scripts of this pipeline call each of these scripts and have appropriate flags to use them, so there is no need to run any R script independently. However, if one wishes to do so these scripts are outlined here.
AffyToRsid.R
Usage
This script is meant to help translate variant IDs to the rsID format used by the Hapmap files. This process requires a summarystatistics file containing the rsIDs of the SNP as well as the chromosome and locus of the SNP. Creates the file 00rsID_format.bim in the directory
Flags
--bim
Full path to the bim file containing the Affymetrix variant IDs to be translated
--stats
Full path to the stats file containing the rsIDs and associated chromosomal and locus information.
--output or -o
Directory you would like file to be output
CallRateDistribution.R
Usage
The purpose of this script is to generate call rate distribution histograms for validation of missingness filtering. The script produces a single pdf containing histograms comparing the call rate before and after missingness filtering. This script relies on built ins to the main shell script 01Missingness filtering so it has minimal utility outside of the pipeline.
Flags
-t or --threshold
The call rate threshold that you have filtered by
--QCdir
The directory containing QC files
heterozygosity.R
Usage
The purpose of this script is to plot the distribution of heterozygosity estimates of the samples. Second this script is meant to generate a list of individuals who are designated as outliers by being +/-3 sd away from the mean heterozygosity.
Flags
--het
path to the het file generated by plink
--tag or -t
naming tag for the outputs
--outputdir or -o
directory where you'd like to output
hwe.R
Usage
The purpose of this script is to generate hardy-weinburg equilibrium statistics. This includes a visual representation as will as a stats file.
Flags
--hwe
Full path to the plink .hwe file to be analyzed
-t or --tag
label for this particular analysis
-o or --outputdir
The directory containing QC files
ibd.R
Usage
The purpose of this script is to plot the identity by descent of the sample population.
Flags
--genome or -g
The .genome file produced by PLINK containing the IBD calculations
--outputdir or -o
The directory where you would like to write your output to
PCA.R
Uses the eigenvector and eigenvalue files produced by plink pca to create skree and pca plots for the data
Usage
Flags
--fam
full path to the fam file youd like to use
--hapmapdir or -h
directory where all the hapmap files are written
--outputdir or -o
directory where you would like to output your plots
--val
full path to eigenvalue file
--vec
full path to eigenvector file