Entry 17: GSEA Assignment - bcb420-2025/Izumi_Ando GitHub Wiki

⏰ Expected vs Actual time taken - 1 hr vs 2 hrs

For this assignment, I downloaded the GSEA v4.4.0 Java app for MacOS (silicon) to run the GSEAPreranked analysis.

Parameters and Inputs Used

Parameter / Input Selection
Ranked List provided mesenchymal vs immuno rank
Geneset * Human_GOBP_AllPathways_noPFOCR_no_GO_iea_March_01_2025_symbol.gmt
Max Geneset Size 200
Min Geneset Size 15
Number of permutations 1000 (Default)
Collapse / Remap to symbols No_Collapse
Enrichment Statistic weighted (default)

1. Explain the reasons for using each of the above parameters. (geneset, max & min geneset size, permutations)

  • As for the geneset, we selected the set with gene symbols to align with the rank list, and the one without IEA because it is a better, human curated list. I used the March 01, 2025 version because the Jan 04 2025 version yielded no results.
  • As for the max gene set size, 200 was selected as we noted in the g:profiler assignment that gene sets with larger sizes tended to be less specific.
  • As for the min gene set size, 15 was selected as anything lower may either be too specific and may increase compute time.
  • As for number of permutations, the default value 1000 was selected. As it already took some time, I did not consider trying it with a larger number.

2. Top Gene Sets

  • The analysis took a few minutes to run, results were displayed on an html output

Mesynchymal subtype (phenotype na_pos)

Top Geneset HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION%MSIGDBHALLMARK%HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION
Associated pvalue 0.000
Associated ES 0.86
Associated NES 2.57
Associated FDR 0.000
Number of genes in leading edge (# core genes) 81
Top gene FBN1

Immunoreactive subtype (phenotype na_neg)

Top Geneset HALLMARK_INTERFERON_ALPHA_RESPONSE%MSIGDBHALLMARK%HALLMARK_INTERFERON_ALPHA_RESPONSE
Associated pvalue 0.000
Associated ES -0.86
Associated NES -2.92
Associated FDR 0.000
Number of genes in leading edge (# core genes) 58
Top gene PROCR

Things I noticed while doing this assignment

  • Number of genes in leading edge are NOT under the "size" column in the results html, its the number of genes with the "Yes" label the under core enrichment column in the individual geneset page (screenshot below) image