Tutorial Overview - jvtalwar/GRIEVOUS GitHub Wiki

Welcome to the GRIEVOUS tutorial! Here we aim to guide you through how to use GRIEVOUS to consistently index, orient, and recover all biallelic SNPs common across all of your genomic datasets. Let's begin!

GRIEVOUS: At A Glance

There is a trilogy of key commands needed to identify, REF/ALT orient, format, and extract all biallelic SNPs that exist across all datasets of interest:

  • grievous realign
  • grievous merge
  • grievous intersect

Both grievous realign and grievous merge require a call per dataset, while grievous intersect requires a single call upon completion of all datasets for which grievous realign (with the same GRIEVOUS database specified) and grievous merge were previously called. Don't worry if this sounds confusing or unclear! We will walk through each of these steps in detail shortly.

Data Inputs: GRIEVOUS File Inputs

GRIEVOUS functions on either summary statistics or genotype files in the PLINK2 binary format, or more specifically their index files, the pvar. Throughout this tutorial, we will employ the shorthand notation of ssf for summary statistics (ssf = summary statistic file) and pvar for genotype files.

The expected resolution of file inputs is at the chromosomal-level. That is regardless of whether your inputs are ssfs or pvars, they must be partitioned to the chromosome-level (i.e., there must be one unique file per chromosome). The reason for this is that GRIEVOUS functions at the chromosome-level to allow for parallelization and incorporation with your workflow of choice. Chromosomes must be indexed in your ssf or pvar as a single unique element in the set of {1-22, X, Y, MT}. More simply, ensure your chromosomal-level file has only one chromosome in it and that it conforms to the following standards (e.g., if you have an X chromosome ensure it is stored as X and not as 23).