Downstreamer - molgenis/systemsgenetics GitHub Wiki
Downstreamer can be used to to perform key gene prioritization using GWAS summary statistics. We do this using 57 tissue specific co-expression networks derived from the Recount3 data.
Content
1️⃣ Getting started
2️⃣ Running PascalX to obtain gene p-values
3️⃣ Tissue enrichment
4️⃣ Key gene enrichment
5️⃣ Code availability
1. Getting started
Download tool and reference data here: https://downloads.molgeniscloud.org/downloads/downstreamerRelease2.tar.gz
This includes the files that are needed for PascalX
2. Running PascalX to obtain gene p-values
Downstreamer needs gene level p-values for the analysis. PascalX can be used to convert the variant level summary statistics of GWAS to gene level summary statistics.
The instruction to do so are listed here: PascalX for Downstreamer
Other sources of gene p-values
In principle Downstreamer can also use gene p-values from another source. This is however not recommend as you would then also need to create a new null distribution for the gene p-values.
The expected format of gene p-values is a tab-separated file with 4 columns:
Column name | Description |
---|---|
gene | The name of the gene |
pvalue | The gene p-value |
nsnps | The number of SNPs on which p-value is based. Can be 1 for all if not applicable |
min_pvalue | The smallest SNP p-value. Can be zero for all if not applicable |
The gene-gene correlations of the null gwas p-values are stored per chromosome arm and are using the following naming scheme: NAME_1_q_correlations.datg
The .datg
files and corresponding .rows.txt.gz
and .cols.txt.gz
files can be created from a tab-seperated .txt
file using the CONVERT_TXT
mode of the Downstreamer.
Note: without updated null distributions the results might not be reliable.
3. Tissue enrichment
First we use Downstreamer to determine which tissue express the genes implicated by the GWAS using a tissue enrichment analysis. By doing this we make sure that the key genes predictions are driven by relevant co-expression instead of cell tissue specific expression.
First change the variables in top of runDownstreamerTissueEnrichment.sh
and selectSignficantTissues.R
Then run:
sh runDownstreamerTissueEnrichment.sh
Rscript selectSignficantTissues.R
This will prepare a parameter specifying which tissue specific networks Downstreamer should use in the next step.
4. Key gene enrichment
We are now ready to run the actual key gene prioritization.
Again change the variables but now in: runDownstreamerKeygenePrediction.sh
sh runDownstreamerKeygenePrediction.sh
The resulting key gene prioritization per tissue are found in: _keygene_enrichtments.xlsx
If needed the Z-scores of the different tissues can be meta-analyzed to obtain the final key gene prioritization score.
5. Code availability
https://github.com/molgenis/systemsgenetics/tree/master/Downstreamer