Entry 17: GSEA Assignment - bcb420-2025/Izumi_Ando GitHub Wiki
⏰ Expected vs Actual time taken - 1 hr vs 2 hrs
For this assignment, I downloaded the GSEA
v4.4.0 Java app for MacOS (silicon) to run the GSEAPreranked
analysis.
Parameters and Inputs Used
1. Explain the reasons for using each of the above parameters. (geneset, max & min geneset size, permutations)
- As for the geneset, we selected the set with gene symbols to align with the rank list, and the one without IEA because it is a better, human curated list. I used the March 01, 2025 version because the Jan 04 2025 version yielded no results.
- As for the max gene set size, 200 was selected as we noted in the
g:profiler
assignment that gene sets with larger sizes tended to be less specific.
- As for the min gene set size, 15 was selected as anything lower may either be too specific and may increase compute time.
- As for number of permutations, the default value 1000 was selected. As it already took some time, I did not consider trying it with a larger number.
2. Top Gene Sets
- The analysis took a few minutes to run, results were displayed on an html output
Mesynchymal subtype (phenotype na_pos)
Top Geneset |
HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION%MSIGDBHALLMARK%HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION |
Associated pvalue |
0.000 |
Associated ES |
0.86 |
Associated NES |
2.57 |
Associated FDR |
0.000 |
Number of genes in leading edge (# core genes) |
81 |
Top gene |
FBN1 |
Immunoreactive subtype (phenotype na_neg)
Top Geneset |
HALLMARK_INTERFERON_ALPHA_RESPONSE%MSIGDBHALLMARK%HALLMARK_INTERFERON_ALPHA_RESPONSE |
Associated pvalue |
0.000 |
Associated ES |
-0.86 |
Associated NES |
-2.92 |
Associated FDR |
0.000 |
Number of genes in leading edge (# core genes) |
58 |
Top gene |
PROCR |
Things I noticed while doing this assignment
- Number of genes in leading edge are NOT under the "size" column in the results html, its the number of genes with the "Yes" label the under core enrichment column in the individual geneset page (screenshot below)
