Journal 4: G:Profiler Tutorial - bcb420-2025/Keren_Zhang GitHub Wiki
G:Profiler Tutorial
Exercise 1 - run g:Profiler
Perform gene-set enrichment analysis using g:Profiler with specific pathway databases and explore the results.
Setup
- Required File:
Pancancer_genelist.txt
- Save this file in the module directory of your CBW work directory.
- Ensure all files are saved in a personal project data folder.
Instructions
Step 1: Launch g:Profiler
- URL: g:Profiler Website
Step 2: Input Query
- Paste the gene list from
Pancancer_genelist.txt
into the Query field. - Ensure that the organism for analysis is set to Homo sapiens.
Step 3: Adjust Parameters
- Significance Threshold: Set to Benjamini-Hochberg FDR with a user threshold of 0.05.
- Tip: If no results, increase the threshold to 0.1, then 1, to check for successful run without significant results.
- Data Sources:
- Unselect all, then select:
- GO Biological Process (with no electronic GO annotations)
- Reactome
- WikiPathways
- Unselect all, then select:
Step 4: Run Query
- Click on the Run query button and address any ambiguous gene mappings if prompted.
Step 5: Explore the Results
- View results under "Detailed Results" for GO:BP, Reactome, and WikiPathways.
- Adjust the term size for more relevant results:
- Max term size from 10000 to 250
- Min term size from 1 to 3
Step 6: Save the Results
- Save results in GEM format for different term sizes:
- File 1:
gProfiler_hsapiens_lab2_results_GEM_termmin3_max10000.gem.txt
- File 2:
gProfiler_hsapiens_lab2_results_GEM_termmin3_max1000.gem.txt
- File 3:
gProfiler_hsapiens_lab2_results_GEM_termmin3_max250.gem.txt
- File 1:
Step 7: Download Pathway Database Files
- Optional: Combine multiple data sources into a single GMT file for use in Cytoscape.
Step 8: Version Control
- Record the version of g:Profiler used for analysis to ensure reproducibility.
- Version Example:
g:Profiler_e111_eg58_p18_b51d8f08
- Version Example:
Tips
- Always check and record the version of tools used to facilitate future reproducibility.
- If g:Profiler returns no results, consider relaxing the FDR threshold to test for tool functionality.
- Save and organize all output files systematically in a results data folder for easy access and analysis continuity.
Exercise 2 - G:Profiler with custom GMT
Perform a pathway enrichment analysis using g:Profiler with a custom GMT file. This allows us to utilize alternate pathway data sources not available in g:Profiler's default settings.
Setup
- Download necessary files: Ensure you have downloaded and placed the
Pancancer_genelist.txt
andBaderlab_genesets.gmt
(June 2024 update) into your project data folder.
Instructions
Step 1: Open g:Profiler
- Access the g:Profiler web tool.
Step 2: Query Setup
- 2a: Copy and paste the gene list from
Pancancer_genelist.txt
into the Query field. - 2b: Expand the Advanced options tab and set the Significance threshold to "Benjamini-Hochberg FDR".
Step 3: Data Source Selection
- Expand the Data sources tab.
- Click the “clear all” button to unselect all preselected choices.
Step 4: Upload Custom GMT
- Navigate to the Custom GMT tab and upload the
Baderlab_genesets.gmt
file. - Verify the file name appears in the “File name used” box.
Step 5: Run the Query
- Click on "Run query" and wait for the results.
Step 6: Results Exploration
- Explore the detailed results provided by g:Profiler.
Step 7: Saving Results
- Save the results file as
gProfiler_hsapiens_Baderlab_max250.gem.txt
.
Optional Steps
If time permits, further explore the capabilities of g:Profiler with these optional steps:
Optional 1: Additional Data Sources
- Add TRANSFAC and miRTarBase databases to the query.
- Rerun the query and observe the changes in the results.
Optional 2: Ordered Query Analysis
- Check the ordered query option to prioritize the genes by the number of mutations.
- Compare the outcomes of ordered vs non-ordered queries.
Optional 3: Alternate Hypothesis Testing Methods
- Rerun the query using different methods such as g:SCS or Bonferonni.
- Assess the impact on the significance of the results.
Tips for Efficient Analysis
- Documentation: Keep detailed notes of each step and parameter used. This will be helpful for reproducing the results or making modifications in future analyses.
- Results Interpretation: Spend adequate time understanding the output from each step. The detailed view in g:Profiler can provide insights into the biological relevance of your findings.
- File Management: Maintain a structured directory for all files used and generated in this exercise to avoid any confusion or data loss.