Entry 10.1: GSEA Assignment - bcb420-2025/Chloe_Calica GitHub Wiki

Objective: Perform a GSEA preranked analysis given the ranked list comparing mesenchymal and immunoreactive ovarian cancer subtypes

Expected Time: 2 hrs

Actual Time: 1.5 hrs

GSEA Parameters:

Assignment Questions

Reasoning For Parameters

  • Maximum geneset size of 200
    • Definition: Exclude larger sets. Default in GSEA is 500.
    • Using a value of 200 means we are decreasing the amount of large sets in our enrichment analysis since large sets can usually dominate the results, masking the smaller, more specific pathways.
    • By choosing 200, we ensure that we do not get broad, less informative sets and avoidant redundant sets that have overlaps in multiple pathways.
  • Minimum geneset size of 15
    • Definition: Exclude smaller sets. Default in GSEA is 15.
    • Very small gene sets are more susceptible to random noise. With fewer genes, the enrichment score becomes unstable as they become inflated.
    • A minimum of 15 ensures that there are enough genes in the set to generate a stable score while also ensuring that the pathways we get are meaningful and not fragmented i.e. partial/incomplete pathways.
  • Gene set permutation = 2000:
    • Number was not provided in the assignment. The GSEA tutorial says to do 100, the lecture/paper on GSEA says a 1000, but it said to do 2000 when running our own dataset.
    • Definition: This paramater is the number of times that the gene-sets will be randomized in order to create a null distribution to calculate the FDR.
    • Picked 2000 since it's not too big and not too little.
      • Too few permutations can result to a poorly estimated null distribution.
      • More permutations can improve the null distribution slightly, but it may not justify the added computational cost.

Top Gene Sets in Ranked Lists

  • Did Mesenchymal as na_pos (first one) and Immunoreactive as na_neg (second result)
Mesenchymal sub type Immunoreactive subtype
Top Gene Set HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION% MSIGDBHALLMARK% HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION HALLMARK_INTERFERON_ALPHA_RESPONSE% MSIGDBHALLMARK% HALLMARK_INTERFERON_ALPHA_RESPONSE
Pvalue Nominal: 0.0 FWER: 0.0 Nominal: 0.0 FWER: 0.0
ES 0.86477774 -0.8557666
NES 2.5517595 -2.9741802
FDR 0.0 0.0
Genes in Leading Edge 56% 73%
Top Gene FBN1 Rank in List: 4, Rank Metric Score: 32.4 Running ES: 0.0234 PROCR Rank in List: 1960, Rank Metric Score: 2.513 Running ES: -0.1249

Link to GSEA User Guide

Running the GSEA Software