Journal 6: GSEA - bcb420-2025/Keren_Zhang GitHub Wiki
GSEA Assignment
Task
To perform a GSEA preranked analysis for the mesenchymal vs immuno rank file and the genesets from the Bader Lab genesets collection from January 1, 2025 using the following parameters:
- maximum geneset size of 200
- minimum geneset size of 15
- gene set permutation
Time
Date: March 18th, 2025
Estimated Time: 1 hour
Time Taken: 2.5 hours
GSEA
Setup
Downloaded the GSEA software from the GSEA website
Load Data
The gmt and the rnk file is loaded into the software.
Run GSEA
Run GSEA using the following parameters:
Homework Questions
1. Explain the reasons for using each of the above parameters.
- Rank File: The ranked list is crucial for GSEA because the analysis method evaluates whether sets of genes (defined by gene sets) are statistically significantly concentrated at the top or bottom of the list, indicating association with one of the conditions under study.
- Maximum Geneset Size of 200: By limiting the size to 200 genes, the analysis avoids the inclusion of overly broad gene sets that could introduce noise into the results. This size threshold helps maintain the specificity of the analysis, making it more likely that if a gene set is identified as significantly enriched, it is closely related to the biological or clinical phenotype being studied.
- Minimum Geneset Size of 15: A minimum threshold helps ensure that each gene set has enough members to provide statistical reliability. With at least 15 genes, a gene set is more likely to represent a robust biological pathway or process, and its enrichment in the data can be interpreted with greater confidence. This size helps balance between not missing potentially interesting biological signals and avoiding spurious results from overly small clusters of genes.
- Gene Set Permutation: Tests whether the rank distribution of genes within a particular set is different from what would be expected by chance in the given ranked list.
2.1 What is the top gene set returned for the Mesenchymal sub type?
What is its pvalue, ES, NES and FDR associated with it.
- pvalue: 0.000
- ES: 0.86477774
- NES: 2.57
- FDR: 0.000
How many genes in its leading edge?
145
What is the top gene associated with this geneset.
2.2 What is the top gene set returned for the Immunoreactive subtype?
What is its pvalue, ES, NES and FDR associated with it.
- pvalue: 0.000
- ES: -0.8557666
- NES: -2.8742568
- FDR: 0.000
How many genes in its leading edge?
79