Notes on GSEA paper - bcb420-2024/Dien

Source

Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005 Oct 25;102(43):15545-50. doi: 10.1073/pnas.0506580102. Epub 2005 Sep 30. PMID: 16199517; PMCID: PMC1239896.

Overview

GSEA considers experiments with genome-wide expression profiles from samples belonging to 2 classes, labeled 1 or 2. Genes are ranked based on correlation between expression and class distinction, which results in ranked list L
Goal of GSEA: For each gene set S, determine whether members of S are randomly distributed throughout L or primarily found at the top or bottom

How it works

Calculation of enrichment score (ES)
- ES indicates the degree to which set S is overrepresented in list L
- Use a running sum statistic: add when gene in L is in set S, subtract when gene in L not in set S
- ES corresponds to KS-like statistic
Estimate significance level of ES
- This is known as the nominal P value.
Adjustment for multiple hypothesis testing
- Normalize ES for each gene set, resulting in normalized enrichment score (NES)
- Calculate FDR for each NES to get the false positive probability

View process

Leading edge set

Not all genes of a gene set will usually participate in a biological process
Leading gene set contains genes in gene set S that appear in list L, before or at the point where running sum reaches max deviation from O --> enrichment signal
Examination of leading edge subset can reveal a biologically important subset within a gene set

Notes on GSEA paper - bcb420-2024/Dien_Nguyen GitHub Wiki

Source

Overview

How it works

Leading edge set

⚠️ GitHub.com Fallback ⚠️

Notes on GSEA paper - bcb420-2024/Dien_Nguyen GitHub Wiki

Source

Overview

How it works

Leading edge set

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️