Discovery of hidden confounders of QTLs - molgenis/systemsgenetics GitHub Wiki

This tool can be used to identify modulators of previously identified QTL effects as we have shown here: Hypothesis-free identification of modulators of genetic risk factors

The source code is availible here: https://github.com/molgenis/systemsgenetics/tree/master/eQTLInteractionAnalyser

The last build can be downloaded here

Input data

We require a folder with the following files:

  • A file with genotypes called: Genotypes.binary
  • A file with expression of the eQTLs: Expression.binary
  • A file with covariate data: Covariates.binary

Optionally a file with known eQTL effect used to correct the covariate data and a list of samples to include in the analysis.

Below the plain text version of the needed files is described. The following command can be used to convert these to the .binary formats.

java -jar ../eQTLInteractionAnalyser-1.2-SNAPSHOT-jar-with-dependencies.jar \
	--convertMatrix \
    -i inputfile.txt \
    -o output.binary \

:exclamation: Important the rows (variants) in the genotype file must correspond to rows (genes) in the expression file. This means that both must have an equal number of rows. If a variant is affecting two genes, this variant should be included twice in the dosage file. The sample order must also be exactly the same

Genotype dosage data

Tab-separated matrix with variants in rows and samples in columns.

eQTL expression data

Tab-separated matrix with genes in rows and samples in columns.

Covariate expression data

Tab-separated matrix with proxy gene expression and other other potential covariates such as PCs to test in rows and samples in the same order as in the other files in columns.

QTL file

An eQTL result file as produced by our QTL mapping pipeline.

Columns
PValue
SNPName
SNPChr
SNPChrPos
ProbeName
ProbeChr
ProbeCenterChrPos
CisTrans
SNPType
AlleleAssessed
OverallZScore
DatasetsWhereSNPProbePairIsAvailableAndPassesQC
DatasetsZScores
DatasetsNrSamples
IncludedDatasetsMeanProbeExpression
IncludedDatasetsProbeExpressionVariance
HGNCName
IncludedDatasetsCorrelationCoefficient
Meta-Beta (SE)
Beta (SE)
FoldChange
FDR

Samples to include file

Per line a sample to include. No heading.

Example command


java -Xmx80g -jar ../eQTLInteractionAnalyser-1.2-SNAPSHOT-jar-with-dependencies.jar \
    -i /inputfolder/ \
    -o /outputfolder/ \
    -e eQTLs.txt \
    -c gender MEDIAN_3PRIME_BIAS MEDIAN_5PRIME_BIAS GC PCT_INTRONIC_BASES \
    -c2 Batch1 Batch2 \
	-is includedSamples.txt \
    -n 20 \
    -pc 20 \
    -nt 8

Options

Short Long Description
-dif --chi2sumDiff Find chi2sum differences for each covariate between 2 consequtive interaction runs
-nn --noNormalization Skip all normalization step. n must be 1
-perm --permute Run permutation
-nt --threads Number of threads
-ncn --noCovNormalization Skip covariate normalization step. n must be 1
-ec --eqtlsCovariates Path to the eQTL file to correct covariates
-c --cov covariates to correct for using an interaction term before running the interaction analysis
-cf --covFile File containing the covariates to correct for using an interaction term before running the interaction analysis. No header, each covariate on a separate line
-sw --swap File containing the SNPs to swap
-e --eqtls Path to the eQTL file to test for interactions
-ch --cohorts Covariates to correct for without interaction term before running the interaction analysis
-i --input Path to the folder containing expression and genotype data
-cm --convertMatrix Convert matrix
-is --includedSamples Included samples
-it --interpret Interpret the z-score matrices
-snps --snpsToTest SNPs to test
-n --maxcov Maximum number of covariates to regress out
-o --output Path to the output folder
-c2 --cov2 Covariates to correct for without interaction term before running the interaction analysis
-ct --covTest Covariates to to test in interaction analysis. Optional, all are tested if not used
-s --start Start round for chi2sumDiff option
-pc --numpc Number of PCs to correct for
-thr --threshold Z-score difference threshold for interpretation