Paper 9: Enrichment Map - bcb420-2025/Keren_Zhang GitHub Wiki
Pathway enrichment analysis and visualization of omics data using g:Profiler, GSEA, Cytoscape and EnrichmentMap
DOI: 10.1038/s41596-018-0103-9
This protocol provides a step-by-step guide for pathway enrichment analysis of omics data, covering three major stages: 1. Definition of a gene list from omics data 2. Pathway enrichment analysis using statistical methods 3. Visualization and interpretation of results
- Identifies biological pathways enriched in a gene list more than expected by chance
- Helps interpret gene lists from RNA-seq, genome-sequencing, and other omics experiments
- Projects disease data onto known mechanisms to increase statistical and interpretative power
- g:Profiler: For analysis of gene lists using Fisher's exact test
- GSEA (Gene Set Enrichment Analysis): For analysis of ranked gene lists
- Cytoscape with EnrichmentMap: For visualization of pathway enrichment results
- Input can be:
** Unranked gene list (e.g., mutated genes) → use g:Profiler ** Ranked gene list (e.g., differentially expressed genes) → use GSEA
- Example data provided:
** Cancer driver genes (for g:Profiler) ** Ovarian cancer subtypes (for GSEA)
- Uses Fisher's exact test for gene lists
- Key parameters:
** Ordered query option for ranked lists ** Size filters (recommended: 5-350 genes per pathway) ** Multiple testing correction (FDR Q < 0.05)
- Analyzes whole-genome ranked lists without thresholding
- Key parameters:
** Number of permutations (default: 1,000) ** Gene set size limits (recommended: max 200 genes) ** Phenotype labels (pos/neg for up/down-regulated)
- Creates network of enriched pathways where:
** Nodes = pathways ** Edges = shared genes between pathways
- Features:
** Color by enrichment score (red/blue for up/down) ** Cluster similar pathways into biological themes ** Heatmaps show gene expression patterns
- Improves statistical power by aggregating gene-level signals
- Results are easier to interpret than raw gene lists
- Facilitates integration of diverse omics data types
- Uses freely available, frequently updated software
- Less effective for pathways regulated by few genes
- Pathway boundaries can be arbitrary across databases
- Biased toward well-annotated pathways
- May miss "dark matter" genes without pathway annotations
- g:Profiler analysis of cancer driver genes identified pathways like:
** "Positive regulation of Ras protein signal transduction" ** "Regulation of interferon-gamma-mediated signaling"
- GSEA analysis of ovarian cancer showed:
** Mesenchymal subtype enriched for cell cycle pathways ** Immunoreactive subtype enriched for immune response pathways
- Complete protocol takes ~4.5 hours
- Designed for biologists with no bioinformatics training
- Includes troubleshooting guide for common issues
- Provides supplementary protocols for advanced analyses
- g:Profiler: Reimand et al. (2016) Nucleic Acids Res
- GSEA: Subramanian et al. (2005) PNAS
- EnrichmentMap: Merico et al. (2010) PLoS ONE