Week 11: EM Protocol - bcb420-2025/Izumi_Ando GitHub Wiki

Pathway enrichment analysis and visualization of omics data using g:Profiler, GSEA, Cytoscape and EnrichmentMap

basically a hands-on guide for how to go from gene list to interpretable pathway results, with actual tools and visualizations

Citation

Reimand, J., Isserlin, R., Voisin, V., Kucera, M., Tannus-Lopes, C., Rostamianfar, A., ... & Bader, G. D. (2019). Pathway enrichment analysis and visualization of omics data using g:Profiler, GSEA, Cytoscape and EnrichmentMap. Nature Protocols, 14, 482โ€“517. https://doi.org/10.1038/s41596-018-0103-9

Notes

image
Screenshot of Figure 1, good summary & intuitive

general idea

  • most omics experiments give you huge gene lists โ†’ need pathway analysis to make sense of them
  • this protocol walks through 3 steps: define the gene list, do enrichment analysis, then visualize & interpret

enrichment tools

  • g:Profiler good for small/moderate lists (ranked or unranked)
  • GSEA handles full ranked genome-wide lists without cutoff
  • both support GO, KEGG, Reactome etc., and apply multiple testing correction
  • g:Profiler uses fisher's exact test (also has ordered enrichment mode), GSEA uses a running-sum KS-based test

visualization ftw

  • they use Cytoscape + EnrichmentMap to visualize results
  • very cool network-based view where nodes = pathways and edges = gene overlap
  • helps to collapse redundant pathways into โ€œthemesโ€
  • you can explore pathways interactively, cluster them, annotate automatically
  • if you load expression data too, you get heatmaps inside the nodes

practical notes

  • stress the importance of using up-to-date annotation databases
  • FDR correction is built-in but still needs careful interpretation
  • choice of background gene set can impact results (esp in non-transcriptome datasets)
  • encourages using leading-edge genes from GSEA or expression overlays for follow-up

useful tidbits

  • explains when to use ranked vs unranked input
  • nice explanation of competitive vs self-contained tests
  • also covers how to interpret multiple related enriched pathways (not always independent)
  • g:Profiler has a built-in way to handle gene list subsetting if you have a ranked list but want to test slices