Postprocessing: Gene Ranking - GarrettJenkinson/informME GitHub Wiki
The user can use a provided utility to rank all Human genes in the Bioconductor library TxDb.Hsapiens.UCSC.hg19.knownGene using the average mutual information based on the method described in [3]. This utility must be run within an R session.
usage (when replicate reference data is available):
source("/path/to/informME/src/R_src/jsGrank.R")
rankGenes(refVrefFiles,testVrefFiles,inFolder,outFolder,tName,rName)
where
-
refVrefFiles is a vector of BIGWIG files that contain the JSD values of a test/reference comparison
-
testVrefFiles is a vector of BIGWIG files that contain the JSD values of available test/reference comparisons
-
inFolder is the directory that contains the JSD files
-
outFolder is the directory used to write the result in an .xlsx file
-
tName is a string providing a name for the test phenotype
-
rName is a string providing a name for the reference phenotype
In this case, the function generates the file gRank-JSD-tName-VS-rName.tsv.
usage (when no replicate reference data is available and rankings will be done by average promoter JSD):
setwd("path/to/informME/src/R_src/")
source("jsGrank.R")
rankGenes(c(),testVrefFiles,inFolder,outFolder,tName,rName)
where
-
testVrefFiles is a vector of BIGWIG files that contain the JSD values of available test/reference comparisons
-
inFolder is the directory that contains the JSD files outFolder is the directory used to write the result in an .xlsx file
-
tName is a string providing a name for the test phenotype
-
rName is a string providing a name for the reference phenotype
In this case, the function generates the file gRankProms-JSD-tName-VS-rName.tsv.
NOTE 1: For this utility, the following tools must be installed in R: GenomicFeatures, GenomicRanges, rtracklayer, TxDb.Hsapiens.UCSC.hg19.knownGene, gamlss, Homo.sapiens
NOTE 2: More information about this utility can be found in informME/src/R_src/postprocess/README.txt
, with a relevant excerpt reproduced below for convenience:
jsGrank.R rankGenes function
----------------------------
This is an R function that ranks all Human genes in the
Bioconductor library TxDb.Hsapiens.UCSC.hg19.knownGene using
the average mutual information based on the method described
in [2]. It should be run within an R session.
default usage (replicate reference data is available):
source("jsGrank.R")
rankGenes(refVrefFiles,testVrefFiles,inFolder,outFolder,
tName,rName)
# refVrefFiles is a vector of BIGWIG files that contain the
# JSD values of a test/reference comparison.
# For example: if
#
# JSD-lungnormal-1-VS-lungnormal-2.bw
# JSD-lungcancer-3-VS-lungnormal-1.bw
# JSD-lungnormal-3-VS-lungnormal-2.bw
#
# are available, then set
#
# textVrefFiles <- c("JSD-lungnormal-1-VS-lungnormal-2.bw",
# "JSD-lungnormal-3-VS-lungnormal-1.bw",
# "JSD-lungnormal-3-VS-lungnormal-2.bw")
#
# testVrefFiles is a vector of BIGWIG files that contain the
# JSD values of available test/reference comparisons.
# For example: if
#
# JSD-lungcancer-1-VS-lungnormal-1.bw
# JSD-lungcancer-2-VS-lungnormal-2.bw
# JSD-lungcancer-3-VS-lungnormal-3.bw
#
# are available, then set
#
# textVrefFiles <- c("JSD-lungcancer-1-VS-lungnormal-1.bw",
# "JSD-lungcancer-2-VS-lungnormal-2.bw",
# "JSD-lungcancer-3-VS-lungnormal-3.bw")
#
# inFolder is the directory that contains the JSD files
# outFolder is the directory used to write the result
# (a .tsv file).
#
# For example:
#
# inFolder <- "/path/to/in-folder/"
# outFolder <- "/path/to/out-folder/"
#
# tName and rName are strings providing names for the
# test and reference phenotypes.
#
# For example:
#
# tName <- "lungcancer"
# rName <- "lungnormal"
default usage (no replicate reference data is available):
source("jsGrank.R")
rankGenes(c(),testVrefFiles,inFolder,outFolder,
tName,rName)
# testVrefFiles is a vector of BIGWIG files that contain the
# JSD values of available test/reference comparisons.
# For example: if
#
# JSD-lungcancer-1-VS-lungnormal-1.bw
# JSD-lungcancer-2-VS-lungnormal-2.bw
# JSD-lungcancer-3-VS-lungnormal-3.bw
#
# are available, then set
#
# textVrefFiles <- c("JSD-lungcancer-1-VS-lungnormal-1.bw",
# "JSD-lungcancer-2-VS-lungnormal-2.bw",
# "JSD-lungcancer-3-VS-lungnormal-3.bw")
#
# inFolder is the directory that contains the JSD files
# outFolder is the directory used to write the result
# (a .tsv file).
#
# For example:
#
# inFolder <- "/path/to/in-folder/"
# outFolder <- "/path/to/out-folder/"
#
# tName and rName are strings providing names for the
# test and reference phenotypes.
#
# For example:
#
# tName <- "lungcancer"
# rName <- "lungnormal"
requirements:
The following R libraries must be installed:
- GenomicFeatures
- GenomicRanges
- Homo.sapiens
- rtracklayer
- TxDb.Hsapiens.UCSC.hg19.knownGene
jsGrank.R rankRegions function
----------------------------
This is an R function that ranks all regions in a BED file using
the average mutual information based on the method described
in [2]. It should be run within an R session.
default usage (replicate reference data is available):
source("jsGrank.R")
rankRegions(refVrefFiles,testVrefFiles,regionsFile,regionsName,
inFolder,outFolder,tName,rName)
# refVrefFiles is a vector of BIGWIG files that contain the
# JSD values of a test/reference comparison.
# For example: if
#
# JSD-lungnormal-1-VS-lungnormal-2.bw
# JSD-lungcancer-3-VS-lungnormal-1.bw
# JSD-lungnormal-3-VS-lungnormal-2.bw
#
# are available, then set
#
# textVrefFiles <- c("JSD-lungnormal-1-VS-lungnormal-2.bw",
# "JSD-lungnormal-3-VS-lungnormal-1.bw",
# "JSD-lungnormal-3-VS-lungnormal-2.bw")
#
# testVrefFiles is a vector of BIGWIG files that contain the
# JSD values of available test/reference comparisons.
# For example: if
#
# JSD-lungcancer-1-VS-lungnormal-1.bw
# JSD-lungcancer-2-VS-lungnormal-2.bw
# JSD-lungcancer-3-VS-lungnormal-3.bw
#
# are available, then set
#
# textVrefFiles <- c("JSD-lungcancer-1-VS-lungnormal-1.bw",
# "JSD-lungcancer-2-VS-lungnormal-2.bw",
# "JSD-lungcancer-3-VS-lungnormal-3.bw")
#
# regionsFile is a string containing the name of the bed file
# with regions that you want ranked.
# For example: if you have the following file with regions
#
# bivalentDomains.bed
#
# then set
#
# regionsFile <- "bivalentDomains.bed"
#
# regionsName is a short string used to nickname the regions.
# For example with the above file you might set:
#
# regionsName <- "bivDom"
#
# inFolder is the directory that contains the JSD files
# and the regionsFile
#
# outFolder is the directory used to write the result
# (a .tsv file).
#
# For example:
#
# inFolder <- "/path/to/in-folder/"
# outFolder <- "/path/to/out-folder/"
#
# tName and rName are strings providing names for the
# test and reference phenotypes.
#
# For example:
#
# tName <- "lungcancer"
# rName <- "lungnormal"
REFERENCES
----------
[1] Jenkninson, G., Abante, J., Feinberg, A.P., and
Goutsias, J. (2018). An information-theoretic approach
to the modeling and analysis of whole-genome bisulfite
sequencing data, BMC Bioinformatics, 19:87,
https://doi.org/10.1186/s12859-018-2086-5.
[2] Jenkinson, G., Abante, J., Koldobskiy, M., Feinberg, A.P.,
and Goutsias, J. (2019). Ranking genomic features using an
information-theoretic measure of epigenetic discordance, BMC
Bioinformatics, 20:175, https://doi.org/10.1186/s12859-019-2777-6.