POST_GWAS GTEx - QuantGen/HPCC GitHub Wiki
Enrichment Analysis using GTEx
This pipeline performs enrichment analysis of GWA results using cis-eQTL results from GTEx.
The pipeline uses a hyper-geometric test. A good reference for this test is Falcon and Gentelman.
Briefly, in this test we have two urns (the study and GTEx), in each bag we have significant (red-balls) and non-significant (blue-balls) results. The p-value for enrichment is obtained using
pval=phyper(m=m,n=n,k=k,q=q,lower.tail=F)
Where:
- m: number of SNPs that are eQTL in GTEx (red balls in GTEx)
- n: number of SNPs that are not eQTL in GTEx (blue balls in GTEx)
- k: number of SNPs significant in the study (red balls in the study)
- q: number of SNPs significant in both GTEx and the study.
Location
/mnt/research/quantgen/projects/POST_GWAS/GTEX_EQTL/
Structure
The pipeline has four folders:
- code
- jobs
- parameters
- logs
The code folder contains two R scripts, one rungs the jobs (across tissues) and one collects results.
The parameters file contains files specifying parameters for a particular analysis, you should generate your own parameter file.
The jobs file contains script for submission, you should generate your own submission file.
The logs folder contains logs produced by each job (this will be cleaned periodically).
Note: you should save your outputs in another folder, within your project. You will specify the location of outputs in the parameter file.
Steps in running your enrichment analysis:
- Create your parameter file (copy an existing one and modify the parameter values are needed).
- Create your own submission file (copy an existing one and modify the parameter values are needed).
- Submit your job using
squeue
, this will dispatch 48 jobs, one per tissue. - Monitor your job using
qstat -u [username]
and by looking at the log files - Once the jobs are finished, run collectResults.R (can be run in a development node), you need to update the parameter file within this script.
Note: Please do not modify the R-scripts. If you find problems please contact me ([email protected]).
Input data
This pipeline uses four main inputs:
- GTEx files (both association results and a MAP)
- A study-file containing the results of the study for which enrichment will be performed
- A parameter file specifying the location of the above data plus other parameters
GTEx Files are located at
/mnt/research/quantgen/datasets/GTEx/source/cis_eQTL_all_associations/'
/mnt/research/quantgen/datasets/GTEx/source/reference/GTEx_Analysis_2017-06-05_v8_WholeGenomeSeq_838Indiv_Analysis_Freeze.lookup_table.txt.gz'
The above files may change as new versions of GTEx results are released.
Also, check the existence of these files before running the pipeline as some folders may have been renamed or updated.
Study files:
There study file must be an ASCII file with SNPs in rows and at least two columns, one with the rs-IDs
and one with the statistic used to determine "significance" (e.g., pvalue
).
Parameter file:
This file defines the location of the input files, the location for the output files (do not save your results in the folder pipeline!) and other important parameters, including the name of the columns containing the rs-id
and the name of the column with the statistic.