Journal Entry #4: GSEA - bcb420-2023/Helena_Jovic GitHub Wiki
Objective
Perform a GSEA (Gene Set Enrichment Analysis) preranked analysis using the ranked list comparing mesenchymal and immunoreactive ovarian cancer subtypes.
Time Management
Date Started: 2023-03-20
Data Completed: 2023-03-20
Estimated Time: 1.5 hours
Actual Time: 3 hours
Workflow
- Download the ranked gene list
- Download geneset from Bader's lab geneset of symbols containing GOBP all pathways but no IEA geneset
- Perform GSEA analysis by downloading software from GSEA
i. Use "Load data" feature to load ranked gene list and geneset
ii. Under tools, I clicked "Run GSEAPreranked"
iii. Selected the geneset that I just loaded
iv. Left the Number of Permutations parameter at its default value of "1000"
v. At Collapse/Remap to gene symbols, I chose "No Collapse"
vi. Set the Max size and min size to 200 and 15 respectively, as outlined in the homework vii. All other parameters left to default
Issues and Resolutions
- I had to rename the text file containing the ranked gene list to have a ".rnk" suffix, for GSEA to recognize it as a ranked gene list
- Selected "No Collapse" under the Collapse/Remap to gene symbols parameter based on guide (as no chip file was provided)
Questions
- Explain the reasons for using each of the above parameters.
i. Maximum geneset size is set to 200
This helps to reduce the chances of detecting false-positive results and increases the robustness of the analysis by narrowing the range of the results.
ii. Minimum geneset size is set to 15
Gene sets with fewer genes may not be biologically meaningful and may have a higher false-positive rate.
iii. Gene set permutation is set to the default 1000
This parameter shuffles the ranked gene list multiple times to create a null distribution of enrichment scores. This helps to determine the significance of the observed enrichment score.
- What is the top gene set returned for the Mesenchymal sub type? What is the top gene set returned for the Immunoreactive subtype? For each of the genesets answer the below questions:
Mesenchymal sub type top gene set returned is "HALLMARK_INTERFERON_ALPHA_RESPONSE%MSIGDB_C2%HALLMARK_INTERFERON_ALPHA_RESPONSE":
i. What is its pvalue, ES, NES and FDR associated with it.
pvalue: 0.000
ES: 0.86
NES: 2.55. FDR: 0.000
ii. How many genes in its leading edge?
145 x (0.57) = ~82
iii. What is the top gene associated with this geneset.
FBN1
Immunoreactive subtype top gene set returned is "HALLMARK_INTERFERON_ALPHA_RESPONSE%MSIGDB_C2%HALLMARK_INTERFERON_ALPHA_RESPONSE":
i. What is its pvalue, ES, NES and FDR associated with it.
pvalue: 0.000
ES: -0.86
NES: -2.85. FDR: 0.000
ii. How many genes in its leading edge?
79 x (0.73) = ~57
iii. What is the top gene associated with this geneset.
PROCR
References
- http://www.gsea-msigdb.org/gsea/doc/GSEAUserGuideFrame.html
- http://yulab-smu.top/biomedical-knowledge-mining-book/enrichment-overview.html
- Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102(43):15545-15550. doi:10.1073/pnas.0506580102