Entry 15: Lecture notes weeks 10 to 12 - bcb420-2025/Izumi_Ando GitHub Wiki

Lecture 10 Recap and GSEA

  • even if the density plot looks the same the box plots could look different - this is why different visualizations are important
  • remember we need to make sure our data follows the assumptions of the tool we are using (ex. edgeR assumes a negative binomial distribution, we checked this with the mean-variance plot)
    ===========
  • GSEA is used for non-thresholded analysis - originally dev for microarray data, we need to do calculate the rank by rank = -log10(p-value)*sign(logFC) do NOT let GSEA calculate the rank for you
  • GSEA is much much more more cited than the other tools
  • it calculates the "enrichment score" and a p-value (slides contain more details about how this is done)
  • if you increase number of permutations, you can stabalize p-values
  • something about genesets ... come back to this (slide 28) - use your own gene sets, Bader lab genesets
  • download Bader Lab genesets via link through R (slide 30)
  • do not use R implementation for GSEA because it is old, use the Java pacakge. there is a new Docker image that contains this.

Lecture 11 - Cytoscape p1 & 2

  • networks are powerful : reduce complexity, more efficient than tables, data integration, intuitive visualization
  • possible to detect protein complexes from protein-protein networks
  • cytoscape is not specific to bioinformatics but is mostly used for it, lots of apps, possible to automate as well
  • interface overview starting around 3:30 in p2
  • different types of networks, different visualizations
  • we will use enrichmentmap app within cytoscape

Lecture 11 - Enrichment Map

  • Enrichment Map is a Cytoscape app that translates enrichments results from different tools into a network (nodes = gene sets, edges = genes in common so the thicker the line the more genes)
  • takes enrichment map file with gene sets (gmt file is optional)
  • look at slide 6 for the math behind each metric (similarity coefficient, jaccard coefficient, overlap)
  • you can map the significance to the size of the node, play around with what you can do
  • you can view the genes associated with a node(s) by clicking on it
  • cmd, click, drag to select
  • take the number identifier, look in databases (ex reactome, wiki pathways)
  • there is also a reactome cytoscape app but does not integrate with cytoscape ecosystem
  • reactome has good pictures but interaction is difficult
  • wiki pathways: also app on cytoscape, details on how to navigate around 11:00 in p2, interaction is possible, you can change colors (by differential signal etc) - I think I want to use this over reactome
  • GeneMania / String both have network information as an alternative to Reactome / wiki pathways in case you get no results there

Lecture 12 - Post Analysis p1 & 2

  • post analysis - adding gene sets that were not there when you did EM (ex: drugs, regulators, disease genes etc)
  • adding signature sets
  • exploratory analysis vs known signatures: in exploratory analysis you assign many genes (will take time), whereas in know sig, you only look at a limited set of genes of interest
  • you can check if a certain drug target is related to your significant genesets
    ======== dark matter
  • the genes we don't see are sometimes important too!
  • dark matter: genes without annotation OR genes that are annotated to genesets that are not part of the significant enrichment results
  • files required: definitions of genesets (gmt file), expression files, GSEA file results
  • data required: genes in expression set, genes in enrichment results, genes in significant enrichment results
  • visualize with venn diagram
  • think about why you get the numbers you do

Lecture 12 - Cytoscape Automation

  • automation panel - if you are having issues in R you can debug them here
  • CyREST : api for cytoscape automation
  • you need to have Cytoscape open to access via R, port access