g:Profiler - bcb420-2025/Clare_Gillis GitHub Wiki

g:Profiler

1 g:Profiler in general

Reads in a list of genes in a file and performs a gene-set enrichment analysis using a hypergeometric test.

Note: you can limit the number of terms returned from GO (this is helpful for just getting the most important terms and reducing computation). Called highlighting driver terms

You can select which databases to search against (ex. specific sections of GO, Reactome, etc)

Results are given with Term Name, Term ID, PADJ, -log10(Padj) and colour coded columns noting whether each gene in the set is present in the term. There is a separate table for each database.

Save results in the GEM (generic enrichment map) format. This contains:

  • Name of each gene-set
  • Description of each gene-set
  • Significance of the overlap (pvalue)
  • Significance of the overlap (adjusted pvalue/qvalue)
  • Phenotype
  • Genes included in each gene-set

You can also download results with original datasources included

Make sure to note which version of g:Profiler you're using because the datasets and Ensembl change over time

2 g:Profiler in - steps

Use packages gprofiler2 and GSA

Set parameters:

  • working_dir
  • data_dir
  • genelist_file
  • max_gs_size (max size of the genesets ex. 250)
  • min_gs_size (min size of the genesets)
  • min_intersection (min intersection between genelist and geneset)
  • organism (ex. hsapiens)

Make a directory to store the data

Read the gene list using read.table

Run the g:Profiler query -- parameters:

  • query (set of genes of interest)
  • significant (whether to only show results g:Profiler deems significant. Set to False so I can pick threshold)
  • ordered_query (whether the genes in the query are ranked)
  • correction_method (use fdr)
  • organism (hsapiens = Homo sapiens)
  • source (geneset source databases to use)

Results give you:

  • query #

significant (T/F)

  • p_value
  • term_size
  • query_size
  • intersection_size
  • precision
  • recall
  • term_id
  • source
  • term_name
  • effective_domain_size
  • source_order
  • parents

Use this URL to download the GMT file "https://biit.cs.ut.ee/gprofiler/static/gprofiler_full_hsapiens.name.gmt"

Get the g:Profiler version gprofiler_version <- get_version_info(organism=organism)

Download the GMT file

gprofiler_gmt_filename <- file.path(working_dir,
                                  paste("gprofiler_full", organism,
                                    gprofiler_version$gprofiler_version,sep="_",
                                    ".name.gmt"))

if(!file.exists(gprofiler_gmt_filename)){
  download.file(url = gprofiler_gmt_url, 
              destfile = gprofiler_gmt_filename)
}