Non‐thresholded Gene set Enrichment Analysis - bcb420-2025/Clare_Gillis GitHub Wiki
1 - starting off
lateeeeee
by non-thresholded gene set enrichment analysis...is that not what i did last assignment? i made a non-thresholded, ranked list, and did g:profiler on that.
Following this tutorial in need to "Specify the exact path to the gsea jar in the parameters in order to automatically compute enrichments using GSEA." - ok i got it, needed to download the command line one and use the gsea-cli.sh file for gsea.
Follow the tutorial and i got a gmt file. i think thats all i need to do to run GSEA.
1.1 Java issue
UH OH - i was looking at the geneset i downloaded, not my output. GSEA didnt work, i got:
Using system JDK.
Error occurred during initialization of boot layer
java.lang.module.FindException: Error reading module: /home/rstudio/projects/A3/GSEA_4.4.0/modules/gsea-minimal-4.4.0.jar
Caused by: java.lang.module.InvalidModuleDescriptorException: Unsupported major.minor version 61.0
There's an issue with my java version... - trying to switch to an older GSEA version (4.3.3 instead of 4.4.0 because it seems to say it will work with java 11, which i have)
1.2 Interpreting the data
I'm trying to read the pos and neg tsv data but it keeps telling me that there are issues with lines not having 12 columns - but when i manually inspect and try to fix the lines, there are no issues. i'm just going to have to remove the messed up lines because there are ve4ry few.
2 Answer questions
What method did you use? What genesets did you use? Make sure to specify versions and cite your methods.
I did it in R using the ranking of ALL genes in my set.
openjdk version "11.0.20.1" 2023-08-24
OpenJDK Runtime Environment (build 11.0.20.1+1-post-Ubuntu-0ubuntu122.04)
OpenJDK 64-Bit Server VM (build 11.0.20.1+1-post-Ubuntu-0ubuntu122.04, mixed mode)
Summarize your enrichment results.
I dont really know what she means by this - I just got a gmt file I'm going to read in the gmt file into a df (of gene set, description, and genes) then summarize how many gene sets i got i guess.
I got 19372 significant pathways - thats a LOT Ok but last time i got 14 SIGNFICANT results and total results this time results dont have a p value??? they just exist.
After 1.1, fixed java issue
Now I have 6372 total results and 1781 significant results with a threshold of 0.05. This is way less total results than the 17703 from all_gprofiler, and wayyy more than the 14 significant results from all_gprofiler.
How do these results compare to the results from the thresholded analysis in Assignment #2. Compare qualitatively. Is this a straight forward comparison? Why or why not?
This is not a super straightforward comparison because i have so many results from GSEA - i'll need to get overarching pathways to see true patterns. Because I only had 14 significant results from g_profiler, it was easy to infer patterns. But i have way more significant results and way less total results than from gprofiler. But no matter what, at a glance, I'm seeing some key words like "amntibody" "antigen" "T-cell" "Interferon" in the top results from my GSEA implying that this one also came up with lots of immune dysregulation. The next part of the assignment will help me see better.