Postprocessing: Gene Ranking - GarrettJenkinson/informME GitHub Wiki
The user can use a provided utility to rank all Human genes in the Bioconductor library TxDb.Hsapiens.UCSC.hg19.knownGene using the average mutual information based on the method described in [3]. This utility must be run within an R session.
usage (when replicate reference data is available):
refVrefFiles is a vector of BIGWIG files that contain the JSD values of a test/reference comparison
testVrefFiles is a vector of BIGWIG files that contain the JSD values of available test/reference comparisons
inFolder is the directory that contains the JSD files
outFolder is the directory used to write the result in an .xlsx file
tName is a string providing a name for the test phenotype
rName is a string providing a name for the reference phenotype
In this case, the function generates the file gRank-JSD-tName-VS-rName.tsv.
usage (when no replicate reference data is available and rankings will be done by average promoter JSD):
testVrefFiles is a vector of BIGWIG files that contain the JSD values of available test/reference comparisons
inFolder is the directory that contains the JSD files outFolder is the directory used to write the result in an .xlsx file
tName is a string providing a name for the test phenotype
rName is a string providing a name for the reference phenotype
In this case, the function generates the file gRankProms-JSD-tName-VS-rName.tsv.
NOTE 1: For this utility, the following tools must be installed in R: GenomicFeatures, GenomicRanges, rtracklayer, TxDb.Hsapiens.UCSC.hg19.knownGene, gamlss, Homo.sapiens
NOTE 2: More information about this utility can be found in informME/src/R_src/postprocess/README.txt
, with a relevant excerpt reproduced below for convenience:
jsGrank.R rankGenes function
This is an R function that ranks all Human genes in the
Bioconductor library TxDb.Hsapiens.UCSC.hg19.knownGene using
the average mutual information based on the method described
in [2]. It should be run within an R session.
default usage (replicate reference data is available):
# refVrefFiles is a vector of BIGWIG files that contain the
# JSD values of a test/reference comparison.
# For example: if
# are available, then set
# textVrefFiles <- c("",
# "",
# "")
# testVrefFiles is a vector of BIGWIG files that contain the
# JSD values of available test/reference comparisons.
# For example: if
# are available, then set
# textVrefFiles <- c("",
# "",
# "")
# inFolder is the directory that contains the JSD files
# outFolder is the directory used to write the result
# (a .tsv file).
# For example:
# inFolder <- "/path/to/in-folder/"
# outFolder <- "/path/to/out-folder/"
# tName and rName are strings providing names for the
# test and reference phenotypes.
# For example:
# tName <- "lungcancer"
# rName <- "lungnormal"
default usage (no replicate reference data is available):
# testVrefFiles is a vector of BIGWIG files that contain the
# JSD values of available test/reference comparisons.
# For example: if
# are available, then set
# textVrefFiles <- c("",
# "",
# "")
# inFolder is the directory that contains the JSD files
# outFolder is the directory used to write the result
# (a .tsv file).
# For example:
# inFolder <- "/path/to/in-folder/"
# outFolder <- "/path/to/out-folder/"
# tName and rName are strings providing names for the
# test and reference phenotypes.
# For example:
# tName <- "lungcancer"
# rName <- "lungnormal"
The following R libraries must be installed:
- GenomicFeatures
- GenomicRanges
- Homo.sapiens
- rtracklayer
- TxDb.Hsapiens.UCSC.hg19.knownGene
jsGrank.R rankRegions function
This is an R function that ranks all regions in a BED file using
the average mutual information based on the method described
in [2]. It should be run within an R session.
default usage (replicate reference data is available):
# refVrefFiles is a vector of BIGWIG files that contain the
# JSD values of a test/reference comparison.
# For example: if
# are available, then set
# textVrefFiles <- c("",
# "",
# "")
# testVrefFiles is a vector of BIGWIG files that contain the
# JSD values of available test/reference comparisons.
# For example: if
# are available, then set
# textVrefFiles <- c("",
# "",
# "")
# regionsFile is a string containing the name of the bed file
# with regions that you want ranked.
# For example: if you have the following file with regions
# bivalentDomains.bed
# then set
# regionsFile <- "bivalentDomains.bed"
# regionsName is a short string used to nickname the regions.
# For example with the above file you might set:
# regionsName <- "bivDom"
# inFolder is the directory that contains the JSD files
# and the regionsFile
# outFolder is the directory used to write the result
# (a .tsv file).
# For example:
# inFolder <- "/path/to/in-folder/"
# outFolder <- "/path/to/out-folder/"
# tName and rName are strings providing names for the
# test and reference phenotypes.
# For example:
# tName <- "lungcancer"
# rName <- "lungnormal"
[1] Jenkninson, G., Abante, J., Feinberg, A.P., and
Goutsias, J. (2018). An information-theoretic approach
to the modeling and analysis of whole-genome bisulfite
sequencing data, BMC Bioinformatics, 19:87,
[2] Jenkinson, G., Abante, J., Koldobskiy, M., Feinberg, A.P.,
and Goutsias, J. (2019). Ranking genomic features using an
information-theoretic measure of epigenetic discordance, BMC
Bioinformatics, 20:175,