g:Profiler - bcb420-2025/Clare_Gillis GitHub Wiki
g:Profiler
1 g:Profiler in general
Reads in a list of genes in a file and performs a gene-set enrichment analysis using a hypergeometric test.
Note: you can limit the number of terms returned from GO (this is helpful for just getting the most important terms and reducing computation). Called highlighting driver terms
You can select which databases to search against (ex. specific sections of GO, Reactome, etc)
Results are given with Term Name, Term ID, PADJ, -log10(Padj) and colour coded columns noting whether each gene in the set is present in the term. There is a separate table for each database.
Save results in the GEM (generic enrichment map) format. This contains:
- Name of each gene-set
- Description of each gene-set
- Significance of the overlap (pvalue)
- Significance of the overlap (adjusted pvalue/qvalue)
- Phenotype
- Genes included in each gene-set
You can also download results with original datasources included
Make sure to note which version of g:Profiler you're using because the datasets and Ensembl change over time
2 g:Profiler in - steps
Use packages gprofiler2 and GSA
Set parameters:
- working_dir
- data_dir
- genelist_file
- max_gs_size (max size of the genesets ex. 250)
- min_gs_size (min size of the genesets)
- min_intersection (min intersection between genelist and geneset)
- organism (ex. hsapiens)
Make a directory to store the data
Read the gene list using read.table
Run the g:Profiler query -- parameters:
- query (set of genes of interest)
- significant (whether to only show results g:Profiler deems significant. Set to False so I can pick threshold)
- ordered_query (whether the genes in the query are ranked)
- correction_method (use fdr)
- organism (hsapiens = Homo sapiens)
- source (geneset source databases to use)
Results give you:
- query #
significant (T/F)
- p_value
- term_size
- query_size
- intersection_size
- precision
- recall
- term_id
- source
- term_name
- effective_domain_size
- source_order
- parents
Use this URL to download the GMT file "https://biit.cs.ut.ee/gprofiler/static/gprofiler_full_hsapiens.name.gmt"
Get the g:Profiler version gprofiler_version <- get_version_info(organism=organism)
Download the GMT file
gprofiler_gmt_filename <- file.path(working_dir,
paste("gprofiler_full", organism,
gprofiler_version$gprofiler_version,sep="_",
".name.gmt"))
if(!file.exists(gprofiler_gmt_filename)){
download.file(url = gprofiler_gmt_url,
destfile = gprofiler_gmt_filename)
}