Journal 4: G:Profiler Tutorial - bcb420-2025/Keren_Zhang GitHub Wiki

G:Profiler Tutorial

Exercise 1 - run g:Profiler

Perform gene-set enrichment analysis using g:Profiler with specific pathway databases and explore the results.

Setup

  • Required File: Pancancer_genelist.txt
    • Save this file in the module directory of your CBW work directory.
    • Ensure all files are saved in a personal project data folder.

Instructions

Step 1: Launch g:Profiler

Step 2: Input Query

  • Paste the gene list from Pancancer_genelist.txt into the Query field.
  • Ensure that the organism for analysis is set to Homo sapiens.

Step 3: Adjust Parameters

  • Significance Threshold: Set to Benjamini-Hochberg FDR with a user threshold of 0.05.
    • Tip: If no results, increase the threshold to 0.1, then 1, to check for successful run without significant results.
  • Data Sources:
    • Unselect all, then select:
      • GO Biological Process (with no electronic GO annotations)
      • Reactome
      • WikiPathways

Step 4: Run Query

  • Click on the Run query button and address any ambiguous gene mappings if prompted.

Step 5: Explore the Results

  • View results under "Detailed Results" for GO:BP, Reactome, and WikiPathways.
  • Adjust the term size for more relevant results:
    • Max term size from 10000 to 250
    • Min term size from 1 to 3

Step 6: Save the Results

  • Save results in GEM format for different term sizes:
    • File 1: gProfiler_hsapiens_lab2_results_GEM_termmin3_max10000.gem.txt
    • File 2: gProfiler_hsapiens_lab2_results_GEM_termmin3_max1000.gem.txt
    • File 3: gProfiler_hsapiens_lab2_results_GEM_termmin3_max250.gem.txt

Step 7: Download Pathway Database Files

  • Optional: Combine multiple data sources into a single GMT file for use in Cytoscape.

Step 8: Version Control

  • Record the version of g:Profiler used for analysis to ensure reproducibility.
    • Version Example: g:Profiler_e111_eg58_p18_b51d8f08

Tips

  1. Always check and record the version of tools used to facilitate future reproducibility.
  2. If g:Profiler returns no results, consider relaxing the FDR threshold to test for tool functionality.
  3. Save and organize all output files systematically in a results data folder for easy access and analysis continuity.

Exercise 2 - G:Profiler with custom GMT

Perform a pathway enrichment analysis using g:Profiler with a custom GMT file. This allows us to utilize alternate pathway data sources not available in g:Profiler's default settings.

Setup

  • Download necessary files: Ensure you have downloaded and placed the Pancancer_genelist.txt and Baderlab_genesets.gmt (June 2024 update) into your project data folder.

Instructions

Step 1: Open g:Profiler

  • Access the g:Profiler web tool.

Step 2: Query Setup

  • 2a: Copy and paste the gene list from Pancancer_genelist.txt into the Query field.
  • 2b: Expand the Advanced options tab and set the Significance threshold to "Benjamini-Hochberg FDR".

Step 3: Data Source Selection

  • Expand the Data sources tab.
  • Click the “clear all” button to unselect all preselected choices.

Step 4: Upload Custom GMT

  • Navigate to the Custom GMT tab and upload the Baderlab_genesets.gmt file.
  • Verify the file name appears in the “File name used” box.

Step 5: Run the Query

  • Click on "Run query" and wait for the results.

Step 6: Results Exploration

  • Explore the detailed results provided by g:Profiler.

Step 7: Saving Results

  • Save the results file as gProfiler_hsapiens_Baderlab_max250.gem.txt.

Optional Steps

If time permits, further explore the capabilities of g:Profiler with these optional steps:

Optional 1: Additional Data Sources

  • Add TRANSFAC and miRTarBase databases to the query.
  • Rerun the query and observe the changes in the results.

Optional 2: Ordered Query Analysis

  • Check the ordered query option to prioritize the genes by the number of mutations.
  • Compare the outcomes of ordered vs non-ordered queries.

Optional 3: Alternate Hypothesis Testing Methods

  • Rerun the query using different methods such as g:SCS or Bonferonni.
  • Assess the impact on the significance of the results.

Tips for Efficient Analysis

  • Documentation: Keep detailed notes of each step and parameter used. This will be helpful for reproducing the results or making modifications in future analyses.
  • Results Interpretation: Spend adequate time understanding the output from each step. The detailed view in g:Profiler can provide insights into the biological relevance of your findings.
  • File Management: Maintain a structured directory for all files used and generated in this exercise to avoid any confusion or data loss.

References