Journal 4: G:Profiler Tutorial - bcb420-2025/Keren_Zhang GitHub Wiki

G:Profiler Tutorial

Exercise 1 - run g:Profiler

Perform gene-set enrichment analysis using g:Profiler with specific pathway databases and explore the results.

Setup

Required File: Pancancer_genelist.txt
- Save this file in the module directory of your CBW work directory.
- Ensure all files are saved in a personal project data folder.

Instructions

Step 1: Launch g:Profiler

URL: g:Profiler Website

Step 2: Input Query

Paste the gene list from Pancancer_genelist.txt into the Query field.
Ensure that the organism for analysis is set to Homo sapiens.

Step 3: Adjust Parameters

Significance Threshold: Set to Benjamini-Hochberg FDR with a user threshold of 0.05.
- Tip: If no results, increase the threshold to 0.1, then 1, to check for successful run without significant results.
Data Sources:
- Unselect all, then select:
  - GO Biological Process (with no electronic GO annotations)
  - Reactome
  - WikiPathways

Step 4: Run Query

Click on the Run query button and address any ambiguous gene mappings if prompted.

Step 5: Explore the Results

View results under "Detailed Results" for GO:BP, Reactome, and WikiPathways.
Adjust the term size for more relevant results:
- Max term size from 10000 to 250
- Min term size from 1 to 3

Step 6: Save the Results

Save results in GEM format for different term sizes:
- File 1: gProfiler_hsapiens_lab2_results_GEM_termmin3_max10000.gem.txt
- File 2: gProfiler_hsapiens_lab2_results_GEM_termmin3_max1000.gem.txt
- File 3: gProfiler_hsapiens_lab2_results_GEM_termmin3_max250.gem.txt

Step 7: Download Pathway Database Files

Optional: Combine multiple data sources into a single GMT file for use in Cytoscape.

Step 8: Version Control

Record the version of g:Profiler used for analysis to ensure reproducibility.
- Version Example: g:Profiler_e111_eg58_p18_b51d8f08

Tips

Always check and record the version of tools used to facilitate future reproducibility.
If g:Profiler returns no results, consider relaxing the FDR threshold to test for tool functionality.
Save and organize all output files systematically in a results data folder for easy access and analysis continuity.

Exercise 2 - G:Profiler with custom GMT

Perform a pathway enrichment analysis using g:Profiler with a custom GMT file. This allows us to utilize alternate pathway data sources not available in g:Profiler's default settings.

Setup

Download necessary files: Ensure you have downloaded and placed the Pancancer_genelist.txt and Baderlab_genesets.gmt (June 2024 update) into your project data folder.

Instructions

Step 1: Open g:Profiler

Access the g:Profiler web tool.

Step 2: Query Setup

2a: Copy and paste the gene list from Pancancer_genelist.txt into the Query field.
2b: Expand the Advanced options tab and set the Significance threshold to "Benjamini-Hochberg FDR".

Step 3: Data Source Selection

Expand the Data sources tab.
Click the “clear all” button to unselect all preselected choices.

Step 4: Upload Custom GMT

Navigate to the Custom GMT tab and upload the Baderlab_genesets.gmt file.
Verify the file name appears in the “File name used” box.

Step 5: Run the Query

Click on "Run query" and wait for the results.

Step 6: Results Exploration

Explore the detailed results provided by g:Profiler.

Step 7: Saving Results

Save the results file as gProfiler_hsapiens_Baderlab_max250.gem.txt.

Optional Steps

If time permits, further explore the capabilities of g:Profiler with these optional steps:

Optional 1: Additional Data Sources

Add TRANSFAC and miRTarBase databases to the query.
Rerun the query and observe the changes in the results.

Optional 2: Ordered Query Analysis

Check the ordered query option to prioritize the genes by the number of mutations.
Compare the outcomes of ordered vs non-ordered queries.

Optional 3: Alternate Hypothesis Testing Methods

Rerun the query using different methods such as g:SCS or Bonferonni.
Assess the impact on the significance of the results.

Tips for Efficient Analysis

Documentation: Keep detailed notes of each step and parameter used. This will be helpful for reproducing the results or making modifications in future analyses.
Results Interpretation: Spend adequate time understanding the output from each step. The detailed view in g:Profiler can provide insights into the biological relevance of your findings.
File Management: Maintain a structured directory for all files used and generated in this exercise to avoid any confusion or data loss.

References

g:Profiler Documentation.