Week 11: EM Protocol - bcb420-2025/Izumi_Ando GitHub Wiki
Pathway enrichment analysis and visualization of omics data using g:Profiler, GSEA, Cytoscape and EnrichmentMap
basically a hands-on guide for how to go from gene list to interpretable pathway results, with actual tools and visualizations
Citation
Reimand, J., Isserlin, R., Voisin, V., Kucera, M., Tannus-Lopes, C., Rostamianfar, A., ... & Bader, G. D. (2019). Pathway enrichment analysis and visualization of omics data using g:Profiler, GSEA, Cytoscape and EnrichmentMap. Nature Protocols, 14, 482โ517. https://doi.org/10.1038/s41596-018-0103-9
Notes
Screenshot of Figure 1, good summary & intuitive
general idea
- most omics experiments give you huge gene lists โ need pathway analysis to make sense of them
- this protocol walks through 3 steps: define the gene list, do enrichment analysis, then visualize & interpret
enrichment tools
- g:Profiler good for small/moderate lists (ranked or unranked)
- GSEA handles full ranked genome-wide lists without cutoff
- both support GO, KEGG, Reactome etc., and apply multiple testing correction
- g:Profiler uses fisher's exact test (also has ordered enrichment mode), GSEA uses a running-sum KS-based test
visualization ftw
- they use Cytoscape + EnrichmentMap to visualize results
- very cool network-based view where nodes = pathways and edges = gene overlap
- helps to collapse redundant pathways into โthemesโ
- you can explore pathways interactively, cluster them, annotate automatically
- if you load expression data too, you get heatmaps inside the nodes
practical notes
- stress the importance of using up-to-date annotation databases
- FDR correction is built-in but still needs careful interpretation
- choice of background gene set can impact results (esp in non-transcriptome datasets)
- encourages using leading-edge genes from GSEA or expression overlays for follow-up
useful tidbits
- explains when to use ranked vs unranked input
- nice explanation of competitive vs self-contained tests
- also covers how to interpret multiple related enriched pathways (not always independent)
- g:Profiler has a built-in way to handle gene list subsetting if you have a ranked list but want to test slices