5. Running a Conditional TWAS - GenomicNetworkAnalysis/GNA GitHub Wiki

This page contains a guide on how to run a conditional TWAS; that is, incorporating individual genes into a genomic network in order to estimate the conditional association between imputed gene expression and each trait (conditional on the other traits in the network). The example below is the taken from the GNA manuscript, in which we conducted a conditional TWAS of 7 psychiatric disorders in European ancestry.

To estimate a genomic network model with gene expression, you will need:

  1. The genetic covariance structure for your set of traits (such as that obtained from multivariable LDSC; see Estimating a genetic covariance structure

  2. The imputed gene expression for each of your traits that is formatted in the way the package is expected (such as that obtained from the FUSION TWAS package; see Preparing data for a conditional TWAS

The genetic covariance structure and formatted, imputed gene expression for a subset of 438 imputed genes from chromosome 17 estimated using the cerebellum tissue GTEx v8 TWAS weights for the psychiatric example below is included in the GNA package. These 438 imputed genes mirror what is obtained from the preparing data for a conditional TWAS example highlighted above in point 2. These example data can be extracted by:

# Load the GNA package
require(GNA)

# Extract the example data (will create directory 'example_data' in your current working directory)
refData("example")

# Load the genetic covariance structure into R
LDSC_PSYC <- readRDS("example_data/LDSC_PSYC.RDS")

# Load the imputed gene expression into R
GENES_PSYC <- readRDS("example_data/GENEDATA_PSYC.RDS")

Optional Initial Step: Estimate the genomic network

As an optional first step, the user may want to use the traitNET function in GNA to estimate only the trait-trait portion of the genomic network (i.e., not including imputed gene expression). This serves to then provide the trait-trait network that is pruned to only include significance edges (partial genetic correlations) as part of the input to the twasNET function. The advantage of applying this pruned network include: (i) only conditioning the associations between gene expression and the traits in the network on well-powered estimates of genetic overlap across your traits, and (ii) if a pruned network is reported at the genome-wide level then this ensures that this same pruned network is carried forward for this level of analysis, thereby facilitating comparison across levels of biological analysis. Note however then applying a pruned network from this optional step that, at the extreme, if there are traits with no remaining edges with other traits that twasNET would simply reproduce the univariate TWAS associations (i.e., these gene expression results would not be conditional on anything).

Example code for estimating the recursively pruned trait network for seven psychiatric disorders is provided below.

Example code:

#Output from LDSC 
covstruc<-LDSC_PSYC

#Specify the model to use for the genetic covariance matrix
fix_omega<-"full"

#Whether to prune the network for non-significant edges
prune<-TRUE

#Multiple testing correction to adjust p-values
p.adjust<-"fdr"

# Significance level for edge inclusion
alpha<-0.05

# Re-estimate edges after pruning
reestimate<-TRUE

#Whether to Use recursive estimation
recursive<-TRUE

PSYCnetwork <- traitNET(covstruc=covstruc,fix_omega=fix_omega,
                      prune=prune,p.adjust=p.adjust,alpha=alpha,reestimate=reestimate,
                      recursive=recursive)

Run the conditional TWAS

The twasNET function takes two primary pieces of input, the LDSC estimated genetic covariance matrix and the formatted gene expression results from univariate TWAS, to estimate conditional associations between gene expression and each trait. These are the only two necessary arguments for this function, but additional arguments are also detailed below:

  1. covstruc: The genetic covariance structure obtained from multivariable LDSC.

  2. genes: The formatted output from univariate TWAS. Example code and details on estimating univariate TWAS and formatting these results are provided here)

  3. fix_omega: Provided matrix that specifies which elements of the edge weight matrix (omega) among the traits are to be estimated. This is an optional argument that would reflect output from estimating the trait-trait genomic network described above. Here we use the sparse, omega matrix obtained from traitNET that reflects the recursively pruned set of edges among the seven psychiatric disorder.

  4. parallel: Optional argument (TRUE/FALSE) denoting whether to run the analyses in parallel across multiple computing cores. This is not utilized here for this subset sample of genes, but is recommended to use in practice to decrease run times. Note that the estimates for each gene are independent of one another, such that the runs can be split up across multiple jobs if working in a high-performance computing cluster environment. However, for most applications, twasNET could be run in parallel on a personal computer in a few hours.

  5. cores: Optional numerical argument denoting how many cores to use if running in parallel. If running in parallel, and nothing is provided for this argument, then the function will automatically use one less than the total number of detected cores in the computing environment.

Example code:

#Output from LDSC 
covstruc<-LDSC_PSYC

# Output from read_fusion
genes <- GENES_PSYC

# Fixed omega matrix from traitNET
fix_omega <- as.matrix(PSYCnetwork$model_results$sparse$omega)  

# Whether to run the analysis in parallel
parallel <- FALSE  

# Number of cores to use if parallel is TRUE, NULL uses the default
cores <- NULL  

TWASNET_PSYC <- twasNET(covstruc = covstruc, genes = genes, fix_omega = fix_omega, toler = toler, parallel = parallel, cores = cores)