I skipped step 5.3 from the instructions as I had manually downloaded the latest pathway definition file.
Lastly, I ran GSEA using the required parameters from the Journal entry(i.e maximum geneset size of 200, minimum geneset size of 15
and gene set permutation set to 1000)
Question/Answers:
1. Question One
Maximum geneset size of 200
Having a max gene set of size 200 has multiple advantages such as GSEA prioritizing more biologically relevant gene sets. This occurs as large gene sets can contain functions that are diverse, which will make interpreting results difficult. Thus, decreasing the max geneset size allows to focus on relevant gene sets. In addition, it also allows GSEA to run faster thus saving time and resources. And lastly, this decreases the risk of false positive results.
Minimum geneset size of 15
Having a min gene set of sizes 15 allows to ensure that the pathway being examine is of biological relevance. Having a minimum values ensures that GSEA filters out small gene sets that are insignificant and cause noise to the results.
Gene set permutation
Gene set permutation allow for the correction of multiple hypothesis testing. Since we are testing multiple pathways simultaneously, it is important to correct for this as this will decrease the risk of false positive results.
2. Question Two
Mesenchymal sub type
top gene set: HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION%MSIGDBHALLMARK%HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION