4. Running a Conditional GWAS - GenomicNetworkAnalysis/GNA GitHub Wiki

This page contains a guide on how to run a conditional GWAS; that is, incorporating individual SNPs into a genomic network in order estimate their association with each trait conditional on the other traits in the network. The example below is the taken from the GNA manuscript, in which we conducted a conditional GWAS for Type 2 diabetes and 5 related cardio-metabolic traits in individuals of East Asian ancestry.

To estimate a genomic network model with individual genetic variants, you will need:

  1. The genetic covariance structure for your set of traits (such as that obtained from multivariable LDSC; see Estimating a genetic covariance structure

  2. The GWAS results for the included traits, formatted in the way the package is expecting. See preparing data for a conditional GWAS for details

The genetic covariance structure and formatted GWAS results for a subset of 100 SNPs used for the example below is included in the GNA package, which can be extracted by using the following code. Note that, in practice, a more typical set of SNPs will be > 1 million.

# Load the GNA package
require(GNA)

# Extract the example data (will create directory 'example_data' in your current working directory)
refData("example")

# Load the genetic covariance structure into R
LDSC_MET <- readRDS("example_data/LDSC_MET.RDS")

# Load the GWAS results for a subset of SNPs into R
SNPDATA_MET <- readRDS("example_data/SNPDATA_MET.RDS")

Optional Initial Step: Estimate the genomic network

As an optional first step, the user may want to use the traitNET function in GNA to estimate only the trait-trait portion of the genomic network (i.e., not including individual genetic variants [SNPs]). This serves to then provide the trait-trait network that is pruned to only include significance edges (partial genetic correlations) as part of the input to the gwasNET function. The advantage of applying this pruned network include: (i) only conditioning the associations between SNPs and the traits in the network on well-powered estimates of genetic overlap across your traits, and (ii) if a pruned network is reported at the genome-wide level then this ensures that this same pruned network is carried forward for this level of analysis, thereby facilitating comparison across levels of biological analysis. Note however then applying a pruned network from this optional step that, at the extreme, if there are traits with no remaining edges with other traits that gwasNET would simply reproduce the univariate GWAS associations (i.e., these SNP associations would not be conditional on anything).

Example code for estimating the recursively pruned trait network for type 2 diabetes and metabolic traits is provided on the prior help page of the wiki. To run gwasNET the code below, which supplies the significance pruned trait-trait network as input, you will need to first run the traitNET code on this prior page.

Run the conditional GWAS

The gwasNET function takes two primary pieces of input, the LDSC estimated genetic covariance matrix and the formatted SNP-level results from the univariate GWAS, to estimate conditional associations between individual genetic variants (SNPs) and each trait. These are the only two necessary arguments for this function, but additional arguments are also detailed below:

  1. covstruc: The genetic covariance structure obtained from multivariable LDSC.

  2. SNPs: The formatted output from univariate GWAS. Example code and details about formatting these results are provided here

  3. fix_omega: Provided matrix that specifies which elements of the edge weight matrix (omega) among the traits are to be estimated. This is an optional argument that would reflect output from estimating the trait-trait genomic network described above. Here we use the sparse, omega matrix obtained from traitNET that reflects the recursively pruned set of edges among T2D and the metabolic traits. We highlight again that to run the code below requires first running the trait-trait network on the prior wiki page

  4. parallel: Optional argument (TRUE/FALSE) denoting whether to run the analyses in parallel across multiple computing cores. This is not utilized here for this subset sample of SNPs, but is highly recommended to use in practice to decrease run times for these SNP-level applications. Note that the estimates for each SNPs are independent of one another, such that the runs can be split up across multiple jobs if working in a high-performance computing cluster environment. For most applications, gwasNET would not be practical to run on a personal computer due to length of run times. An ideal scenario with respect to minimizing run times would reflect splitting up the SNP-level output across multiple jobs submitted on a computer cluster that are themselves each running in parallel.

  5. cores: Optional numerical argument denoting how many cores to use if running in parallel. If running in parallel, and nothing is provided for this argument, then the function will automatically use one less than the total number of detected cores in the computing environment.

  6. toler: What tolerance level to use for matrix inversions. This is only something that needs to be of concern if warning/error messages are produced to the effect of "matrix could not be inverted".

Example code:

#Output from LDSC 
covstruc<-LDSC_MET

# Output from sumstats function
SNPs <- SNPDATA_MET

# Fixed omega matrix from traitNET
fix_omega <- as.matrix(METnetwork$model_results$sparse$omega)  

# Whether to run the analysis in parallel
parallel <- FALSE  

# Number of cores to use if parallel is TRUE, NULL uses the default
cores <- NULL  

#optional argument specifying tolerance to use for matrix inversion, default is NULL
toler <- NULL  

GWASNET_MET<- gwasNET(covstruc = covstruc, SNPs=SNPs, fix_omega = fix_omega, parallel = parallel, cores = cores, toler = toler)