Entry 15: Lecture notes weeks 10 to 12 - bcb420-2025/Izumi_Ando GitHub Wiki

Lecture 10 Recap and GSEA

even if the density plot looks the same the box plots could look different - this is why different visualizations are important
remember we need to make sure our data follows the assumptions of the tool we are using (ex. edgeR assumes a negative binomial distribution, we checked this with the mean-variance plot)
===========
GSEA is used for non-thresholded analysis - originally dev for microarray data, we need to do calculate the rank by rank = -log10(p-value)*sign(logFC) do NOT let GSEA calculate the rank for you
GSEA is much much more more cited than the other tools
it calculates the "enrichment score" and a p-value (slides contain more details about how this is done)
if you increase number of permutations, you can stabalize p-values
something about genesets ... come back to this (slide 28) - use your own gene sets, Bader lab genesets
download Bader Lab genesets via link through R (slide 30)
do not use R implementation for GSEA because it is old, use the Java pacakge. there is a new Docker image that contains this.

networks are powerful : reduce complexity, more efficient than tables, data integration, intuitive visualization
possible to detect protein complexes from protein-protein networks
cytoscape is not specific to bioinformatics but is mostly used for it, lots of apps, possible to automate as well
interface overview starting around 3:30 in p2
different types of networks, different visualizations
we will use enrichmentmap app within cytoscape

Enrichment Map is a Cytoscape app that translates enrichments results from different tools into a network (nodes = gene sets, edges = genes in common so the thicker the line the more genes)
takes enrichment map file with gene sets (gmt file is optional)
look at slide 6 for the math behind each metric (similarity coefficient, jaccard coefficient, overlap)
you can map the significance to the size of the node, play around with what you can do
you can view the genes associated with a node(s) by clicking on it
cmd, click, drag to select
take the number identifier, look in databases (ex reactome, wiki pathways)
there is also a reactome cytoscape app but does not integrate with cytoscape ecosystem
reactome has good pictures but interaction is difficult
wiki pathways: also app on cytoscape, details on how to navigate around 11:00 in p2, interaction is possible, you can change colors (by differential signal etc) - I think I want to use this over reactome
GeneMania / String both have network information as an alternative to Reactome / wiki pathways in case you get no results there

post analysis - adding gene sets that were not there when you did EM (ex: drugs, regulators, disease genes etc)
adding signature sets
exploratory analysis vs known signatures: in exploratory analysis you assign many genes (will take time), whereas in know sig, you only look at a limited set of genes of interest
you can check if a certain drug target is related to your significant genesets
======== dark matter
the genes we don't see are sometimes important too!
dark matter: genes without annotation OR genes that are annotated to genesets that are not part of the significant enrichment results
files required: definitions of genesets (gmt file), expression files, GSEA file results
data required: genes in expression set, genes in enrichment results, genes in significant enrichment results
visualize with venn diagram
think about why you get the numbers you do