Paper 9: Enrichment Map - bcb420-2025/Keren_Zhang GitHub Wiki

Table of Contents

Pathway enrichment analysis and visualization of omics data using g:Profiler, GSEA, Cytoscape and EnrichmentMap

DOI: 10.1038/s41596-018-0103-9

This protocol provides a step-by-step guide for pathway enrichment analysis of omics data, covering three major stages: 1. Definition of a gene list from omics data 2. Pathway enrichment analysis using statistical methods 3. Visualization and interpretation of results

Key Concepts

Pathway Enrichment Analysis

  • Identifies biological pathways enriched in a gene list more than expected by chance
  • Helps interpret gene lists from RNA-seq, genome-sequencing, and other omics experiments
  • Projects disease data onto known mechanisms to increase statistical and interpretative power

Major Tools Used

  • g:Profiler: For analysis of gene lists using Fisher's exact test
  • GSEA (Gene Set Enrichment Analysis): For analysis of ranked gene lists
  • Cytoscape with EnrichmentMap: For visualization of pathway enrichment results

Protocol Steps

Stage 1: Define Gene List

  • Input can be:
  ** Unranked gene list (e.g., mutated genes) → use g:Profiler  
  ** Ranked gene list (e.g., differentially expressed genes) → use GSEA  
  • Example data provided:
  ** Cancer driver genes (for g:Profiler)  
  ** Ovarian cancer subtypes (for GSEA)  

Stage 2: Pathway Enrichment Analysis

Option A: g:Profiler

  • Uses Fisher's exact test for gene lists
  • Key parameters:
  ** Ordered query option for ranked lists  
  ** Size filters (recommended: 5-350 genes per pathway)  
  ** Multiple testing correction (FDR Q < 0.05)  

Option B: GSEA

  • Analyzes whole-genome ranked lists without thresholding
  • Key parameters:
  ** Number of permutations (default: 1,000)  
  ** Gene set size limits (recommended: max 200 genes)  
  ** Phenotype labels (pos/neg for up/down-regulated)  

Stage 3: Visualization with EnrichmentMap

  • Creates network of enriched pathways where:
  ** Nodes = pathways  
  ** Edges = shared genes between pathways  
  • Features:
  ** Color by enrichment score (red/blue for up/down)  
  ** Cluster similar pathways into biological themes  
  ** Heatmaps show gene expression patterns  

Advantages

  • Improves statistical power by aggregating gene-level signals
  • Results are easier to interpret than raw gene lists
  • Facilitates integration of diverse omics data types
  • Uses freely available, frequently updated software

Limitations

  • Less effective for pathways regulated by few genes
  • Pathway boundaries can be arbitrary across databases
  • Biased toward well-annotated pathways
  • May miss "dark matter" genes without pathway annotations

Example Results

  • g:Profiler analysis of cancer driver genes identified pathways like:
  ** "Positive regulation of Ras protein signal transduction"  
  ** "Regulation of interferon-gamma-mediated signaling"  
  • GSEA analysis of ovarian cancer showed:
  ** Mesenchymal subtype enriched for cell cycle pathways  
  ** Immunoreactive subtype enriched for immune response pathways  

Implementation Notes

  • Complete protocol takes ~4.5 hours
  • Designed for biologists with no bioinformatics training
  • Includes troubleshooting guide for common issues
  • Provides supplementary protocols for advanced analyses

References

  • g:Profiler: Reimand et al. (2016) Nucleic Acids Res
  • GSEA: Subramanian et al. (2005) PNAS
  • EnrichmentMap: Merico et al. (2010) PLoS ONE
⚠️ **GitHub.com Fallback** ⚠️