Thresholded over‐representation analysis - bcb420-2025/Clare_Gillis GitHub Wiki
TODO:
- Extract set of significantly up-regulated and down-regulated genes
- With your significantly up-regulated and down-regulated set of genes run a thresholded gene set enrichment analysis
Which method did you choose and why?
- I'm thinking about choosing thresholded list because I have a LOT of differentially expressed genes to deal with.
- ended up choosing ranked because i want my analysis to be sensitive to weak signals especially for non-chromosome 21 genes
What annotation data did you use and why? What version of the annotation are you using?
- I'm using GO:BP, Reactome, KEGG, and WikiPathways because these are the datasets that cover higher-level function (ex. pathway.) This is the type of data found in the publications I used, and is easier to interpret than lower level data like molecular pathway.
How many genesets were returned with what thresholds?
- See report
Run the analysis using the up-regulated set of genes, and the down-regulated set of genes separately. How do these results compare to using the whole list (i.e all differentially expressed genes together vs. the up-regulated and down regulated differentially expressed genes separately)?
- Got way more genes for up + down than all at once
Present your results with the use of tables and screenshots. All figures should have appropriate figure legends. If using figures create a figures directory in your repo and make sure all references to the figures are relative in your Rmarkdown notebook.
1.1 Starting the TORA
Used the class notes to get both a ranked list and non-thresholded lists. I'll choose which to use later (once again, there are a LOT of significant genes...)
why do the class notes choose genes based on PValue instead of FDR? I think I'll switch that.
Doing g:Profiler now - I'll assess all, upregulated, downregulated, and non-chromosome 21 genes separately. (non-chromosome 21 because my dataset is Down Syndrome (trisomy 21) vs control so it would be interesting to see what's going on outside of chromosome 21)
For some reason, I keep getting 0 significant hits on g:Profiler even though I've got thousands of significantly differentially expressed genes - what the heck!? Fixed it - MAKE SURE GENES ARE ORDERED IN DESCENDING ORDER OF IMPORTANCE (mine were backwards for so long)
Ok i got a ton of upregulated terms, no downregulated terms, and a few from all. i think the downregulated genes and non differentially expressed genes dilute the all gene query, meaning only the strongest signals get through (there are only 9)