Thresholded over‐representation analysis - bcb420-2025/Clare_Gillis GitHub Wiki

TODO:

  • Extract set of significantly up-regulated and down-regulated genes
  • With your significantly up-regulated and down-regulated set of genes run a thresholded gene set enrichment analysis

Which method did you choose and why?

  • I'm thinking about choosing thresholded list because I have a LOT of differentially expressed genes to deal with.
  • ended up choosing ranked because i want my analysis to be sensitive to weak signals especially for non-chromosome 21 genes

What annotation data did you use and why? What version of the annotation are you using?

  • I'm using GO:BP, Reactome, KEGG, and WikiPathways because these are the datasets that cover higher-level function (ex. pathway.) This is the type of data found in the publications I used, and is easier to interpret than lower level data like molecular pathway.

How many genesets were returned with what thresholds?

  • See report

Run the analysis using the up-regulated set of genes, and the down-regulated set of genes separately. How do these results compare to using the whole list (i.e all differentially expressed genes together vs. the up-regulated and down regulated differentially expressed genes separately)?

  • Got way more genes for up + down than all at once

Present your results with the use of tables and screenshots. All figures should have appropriate figure legends. If using figures create a figures directory in your repo and make sure all references to the figures are relative in your Rmarkdown notebook.

1.1 Starting the TORA

Used the class notes to get both a ranked list and non-thresholded lists. I'll choose which to use later (once again, there are a LOT of significant genes...)

why do the class notes choose genes based on PValue instead of FDR? I think I'll switch that.

Doing g:Profiler now - I'll assess all, upregulated, downregulated, and non-chromosome 21 genes separately. (non-chromosome 21 because my dataset is Down Syndrome (trisomy 21) vs control so it would be interesting to see what's going on outside of chromosome 21)

For some reason, I keep getting 0 significant hits on g:Profiler even though I've got thousands of significantly differentially expressed genes - what the heck!? Fixed it - MAKE SURE GENES ARE ORDERED IN DESCENDING ORDER OF IMPORTANCE (mine were backwards for so long)

Ok i got a ton of upregulated terms, no downregulated terms, and a few from all. i think the downregulated genes and non differentially expressed genes dilute the all gene query, meaning only the strongest signals get through (there are only 9)