Notes on GSEA paper - bcb420-2024/Dien_Nguyen GitHub Wiki

Source

Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005 Oct 25;102(43):15545-50. doi: 10.1073/pnas.0506580102. Epub 2005 Sep 30. PMID: 16199517; PMCID: PMC1239896.

Overview

  • GSEA considers experiments with genome-wide expression profiles from samples belonging to 2 classes, labeled 1 or 2. Genes are ranked based on correlation between expression and class distinction, which results in ranked list L
  • Goal of GSEA: For each gene set S, determine whether members of S are randomly distributed throughout L or primarily found at the top or bottom

How it works

  1. Calculation of enrichment score (ES)
    • ES indicates the degree to which set S is overrepresented in list L
    • Use a running sum statistic: add when gene in L is in set S, subtract when gene in L not in set S
    • ES corresponds to KS-like statistic
  2. Estimate significance level of ES
    • This is known as the nominal P value.
  3. Adjustment for multiple hypothesis testing
    • Normalize ES for each gene set, resulting in normalized enrichment score (NES)
    • Calculate FDR for each NES to get the false positive probability
View process

Leading edge set

  • Not all genes of a gene set will usually participate in a biological process
  • Leading gene set contains genes in gene set S that appear in list L, before or at the point where running sum reaches max deviation from O --> enrichment signal
  • Examination of leading edge subset can reveal a biologically important subset within a gene set
⚠️ **GitHub.com Fallback** ⚠️