g:Profiler - bcb420-2025/Clare_Gillis GitHub Wiki

g:Profiler

1 g:Profiler in general

Reads in a list of genes in a file and performs a gene-set enrichment analysis using a hypergeometric test.

Note: you can limit the number of terms returned from GO (this is helpful for just getting the most important terms and reducing computation). Called highlighting driver terms

You can select which databases to search against (ex. specific sections of GO, Reactome, etc)

Results are given with Term Name, Term ID, PADJ, -log10(Padj) and colour coded columns noting whether each gene in the set is present in the term. There is a separate table for each database.

Save results in the GEM (generic enrichment map) format. This contains:

Name of each gene-set
Description of each gene-set
Significance of the overlap (pvalue)
Significance of the overlap (adjusted pvalue/qvalue)
Phenotype
Genes included in each gene-set

You can also download results with original datasources included

Make sure to note which version of g:Profiler you're using because the datasets and Ensembl change over time

2 g:Profiler in - steps

Use packages gprofiler2 and GSA

Set parameters:

working_dir
data_dir
genelist_file
max_gs_size (max size of the genesets ex. 250)
min_gs_size (min size of the genesets)
min_intersection (min intersection between genelist and geneset)
organism (ex. hsapiens)

Make a directory to store the data

Read the gene list using read.table

Run the g:Profiler query -- parameters:

query (set of genes of interest)
significant (whether to only show results g:Profiler deems significant. Set to False so I can pick threshold)
ordered_query (whether the genes in the query are ranked)
correction_method (use fdr)
organism (hsapiens = Homo sapiens)
source (geneset source databases to use)

Results give you:

query #

significant (T/F)

p_value
term_size
query_size
intersection_size
precision
recall
term_id
source
term_name
effective_domain_size
source_order
parents

Use this URL to download the GMT file "https://biit.cs.ut.ee/gprofiler/static/gprofiler_full_hsapiens.name.gmt"

Get the g:Profiler version gprofiler_version <- get_version_info(organism=organism)

Download the GMT file

gprofiler_gmt_filename <- file.path(working_dir,
                                  paste("gprofiler_full", organism,
                                    gprofiler_version$gprofiler_version,sep="_",
                                    ".name.gmt"))

if(!file.exists(gprofiler_gmt_filename)){
  download.file(url = gprofiler_gmt_url, 
              destfile = gprofiler_gmt_filename)
}