5. G:Profiler - bcb420-2022/Yuzi_Li GitHub Wiki

Objective

Set up my student wiki and repo for BCB420 and get familiar with Docker.

Duration

Expected duration: 1h Actual duration: 1h

Progress

Tasks

  1. Run G:Profiler on designated set of genes
  2. Answer questions

Running G:Profiler on genes

  • Parameters: parameters
  • Selecting ensembl id for genes that map to multiple ensembl ids: multi_ensembl_ids

Answering questions

What is the top term returned in each data source?

  • GO: biological process: immune system process
  • Wiki pathways: Allograft Rejection
  • Reactome: Immune System

How many genes are in each of the above genesets returned? (hint, in the Detailed results tab of g:profiler results if you click on the arrows next to the stats heading you will be able to see the number of genes in a term, number of genes in your query and number of genes in your query that are also in your term)

  • Immune system process (GO: biological process): 2020 genes in term
  • Allograft Rejection (Wiki pathways): 88 genes in term
  • Immune System (Reactome): 2041 genes in term

How many genes from our query are found in the above genesets?

  • Immune system process (GO: biological process): 409 genes from query
  • Allograft Rejection (Wiki pathways): 287 genes from query
  • Immune System (Reactome): 334 genes from query

Change g:profiler settings so that you limit the size of the returned genesets. Make sure the returned genesets are between 5 and 200 genes in size. Did that change the results?

  • The top terms became more specific for Reactome and GO: positive regulation of leukocyte cell-cell adhesion (GO: biological process), Immunoregulatory interactions between a Lymphoid and a non-Lymphoid cell (Reactome). The top term is the same for Wiki pathways.
  • The number of terms included in the term decreased for top terms from Reactome and GO, but the number of terms remains the same for Wiki pathways.
  • The number of query genes found in geneset is the same.
  • Decreasing the maximum term size makes the term more specific and contains less genes.

Which of the 4 ovarian cancer expression subtypes do you think this list represents?

  • This list should represent the immunoreactive subtype because it is associated with differential gene expression mainly in the immunoreactive pathways.

Bonus: The top gene returned for this comparison is TFEC (ensembl gene id:ENSG00000105967). Is it found annotated in any of the pathways returned by g:profiler for our query? What terms is it associated with it g:profiler?

  • TFEC is not annotated in any returned pathway.
  • TFEC is not associated with any term.

Conclusions and Outlook

  • G:Profiler is a good tool for analyzing gene enrichment
  • We can find more specific pathways by reducing the maximum term size, and we can find more general pathways by increasing the term sizes