6. G:Profiler - bcb420-2022/RuoXuan_Wang GitHub Wiki

Objective

Learn about using g:profiler

Duration

Time estimated: 1.5 h; taken 1.5 h;
date started: 2022-03-08; date completed: 2022-03-08

Progress

  • genelist.txt as query set
    • ordered according to rank
  • run a g:profiler enrichment analysis with the following parameters:
    • Data sources: Reactome, Go biologoical process, and Wiki pathways
      • Under Data sources, deselected all options except the three
      • didn't change anything else but (for next time) should prob check all results
    • Multiple hypothesis testing - Benjamini hochberg
      • Under Advanced options, changed Significance threshold to Benjamini-Hochberg FDR

Notes on Results

  • Error message sometimes old vs new entry
  • Graph is overview

Answer the questions below:


1. What is the top term returned in each data source?

  • GO biological process: immune system process (GO:0002376)
  • Reactome: Immune System (REAC:R-HSA-168256)
  • Wiki pathways: TYROBP causal network in microglia (WP:WP3945)

2. How many genes are in each of the above genesets returned? (hint, in the Detailed results tab of g:profiler results if you click on the arrows next to the stats heading you will be able to see the number of genes in a term, number of genes in your query and number of genes in your query that are also in your term)

  • under T
  • GO biological process: immune system process (GO:0002376) - 2748
  • Reactome: Immune System (REAC:R-HSA-168256) - 2041
  • Wiki pathways: TYROBP causal network in microglia (WP:WP3945) - 63

3. How many genes from our query are found in the above genesets?

  • under T∩Q
  • GO biological process: immune system process (GO:0002376) - 287
  • Reactome: Immune System (REAC:R-HSA-168256) - 218
  • Wiki pathways: TYROBP causal network in microglia (WP:WP3945) - 27

4. Change g:profiler settings so that you limit the size of the returned genesets. Make sure the returned genesets are between 5 and 200 genes in size. Did that change the results?

  • Yes. The returned terms are more detailed, since the number of genes included is reduced. For example, the top term for GO is now "antigen processing and presentation" rather than "immune system process".

5. Which of the 4 ovarian cancer expression subtypes do you think this list represents?

  • The genes seem to be involved in the immune response and signalling, and also immune system cell proliferation.
  • According to The Cancer Genome Atlas (TCGA) project, the 4 ovarian cancer expression subtypes are the immunoreactive subtype (characterized by chemokine expression), the proliferative subtype (characterized by proliferation marker expression), the differentiated subtype (ovarian tumor marker expression), and the mesenchymal subtype (markers suggestive of increased stromal components)(Chen et al.).
  • Thus, they could represent the immunoreactive subtype or the proliferative subtype, but the top terms are related to immune response, so the immunoreactive subtype is more likely.

Bonus: The top gene returned for this comparison is TFEC (ensembl gene id:ENSG00000105967). Is it found annotated in any of the pathways returned by g:profiler for our query? What terms is it associated with in g:profiler?

  • GO: response to stimulus; positive regulation of biological process
  • nucleic acid metabolic process, cellular aromatic compound metabolic process

  • Record results in journal

Conclusion and outlook

  • This is a good program for functional enrichment analysis/over-representation analysis/gene set enrichment analysis on a set of genes. It will be used in the second section of Assignment 2, so this was practice for that. I will be able to explore functional information sources and identify statistically significantly enriched terms for my chosen dataset.
  • Work on Assignment 2

References

  • https://biit.cs.ut.ee/gprofiler/gost
  • Chen, G. M., Kannan, L., Geistlinger, L., Kofia, V., Safikhani, Z., Gendoo, D., Parmigiani, G., Birrer, M., Haibe-Kains, B., & Waldron, L. (2018). Consensus on Molecular Subtypes of High-Grade Serous Ovarian Carcinoma. Clinical cancer research : an official journal of the American Association for Cancer Research, 24(20), 5037–5047. https://doi.org/10.1158/1078-0432.CCR-18-0784