Journal Entry 4: Homework Assignment : G:Profiler - bcb420-2022/Sabbir_Hossain GitHub Wiki

Table of Contents

Objective

Download the required files from Quercus and then use the gprofiler tool to get the answers to the questions that we were tasked with on Quercus. This homework assignment is also helpful to learn how to use the BH hypothesis testing method for ORA and gprofiler analysis.

Time est.: 1 h
Time used: 1 h
Date started: 2022/04/11
Date completed: 2022/04/11

Activates & Tasks

Use this list of genes genelist.txt as your query set and run a g:profiler enrichment analysis with the following parameters:

Data sources : Reactome, Go biologoical process, and Wiki pathways Multiple hypothesis testing - Benjamini Hochberg Answer the questions below:

  1. What is the top term returned in each data source?
  2. How many genes are in each of the above genesets returned? (hint, in the Detailed results tab of g:profiler results if you click on the arrows next to the stats heading you will be able to see the number of genes in a term, number of genes in your query and number of genes in your query that are also in your term)
  3. How many genes from our query are found in the above genesets?
    1. Change g:profiler settings so that you limit the size of the returned genesets. Make sure the returned genesets are between 5 and 200 genes in size. ##Did that change the results?
    2. Which of the 4 ovarian cancer expression subtypes do you think this list represents?
      1. Bonus: The top gene returned for this comparison is TFEC (ensembl gene id:ENSG00000105967). Is it found annotated in any of the pathways returned by g:profiler for our query? What terms is it associated with it g:profiler?

Progress & Notes

What is the top term returned in each data source?

GO:MF - signaling receptor activity GO:BP - immune system process GO:CC - side of membrane REAC- immune system WP - Allograft rejection

How many genes are in each of the above genesets returned? (hint, in the Detailed results tab of g:profiler results if you click on the arrows next to the stats heading you will be able to see the number of genes in a term, number of genes in your query and number of genes in your query that are also in your term)

GO:MF - 1550 GO:BP - 2748 GO:CC - 625 REAC- 2041 WP - 88

How many genes from our query are found in the above genesets?

GO:MF - 116 GO:BP - 287 GO:CC - 87 REAC- 218 WP - 31

Change g:profiler settings so that you limit the size of the returned genesets. Make sure the returned genesets are between 5 and 200 genes in size. Did that change the results

Changed result such that we got these new top terms: GO:MF - immune receptor activity GO:BP - antigen processing and presentation GO:CC - MHC protein complex REAC - Immunoregulatory interactions between a Lymphoid and a non-Lymphoid cell WP - Allograft rejection

Which of the 4 ovarian cancer expression subtypes do you think this list represents?

Given the number of top terms relating to the immune system, in particular the allograft rejection geneset, this list must represent the immunoreactive subtype.

Bonus: The top gene returned for this comparison is TFEC (ensembl gene id:ENSG00000105967). Is it found annotated in any of the pathways returned by g:profiler for our query? What terms is it associated with it g:profiler?

TFEC is associated with the protein binding and positive regulation by the cell in response to stress pathways.

Conclusion, Outlook, & Discussion

Learned how to use gprofiler, and ORA analysis and BH testing methods, as well as other various testing methods on how to do ORA properly. Will come in very helpful tool for A3 but more on A2 as I have noticed.

References

  1. Chen, Gregory M et al. “Consensus on Molecular Subtypes of High-Grade Serous Ovarian Carcinoma.” Clinical cancer research : an official journal of the American Association for Cancer Research vol. 24,20 (2018): 5037-5047. doi:10.1158/1078-0432.CCR-18-0784 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6207081/
  2. Uku Raudvere, Liis Kolberg, Ivan Kuzmin, Tambet Arak, Priit Adler, Hedi Peterson, Jaak Vilo: g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update) Nucleic Acids Research 2019; doi:10.1093/nar/gkz369
  3. Homework Assignment: GProfiler on Quercus
  4. Lecture Notes
⚠️ **GitHub.com Fallback** ⚠️