Journal Entry 5: Homework Assignment : GSEA - bcb420-2022/Sabbir_Hossain GitHub Wiki

Table of Contents

Objective

To learn how to use the GSEA database, and application software on a ranked, .rnk file.

Time est.: 10 mins Time used: 5mins h Date started: 2022/04/21
Date completed: 2022/04/21

Progress & Notes

  1. Curl was used to obtain the routes file from the Bader Lab site, and GitHub was used to retrieve the rankings file.
  2. I cleaned up the rankings data and converted it to the GSEA-friendly format.
  3. Using the fgsea R package, run GSEA on the ranked gene list with the settings indicated in the assignment handout and 1000 permutations.

Explain the reasons for using each of the above parameters.

The rank file contains a list of genes linked to ovarian cancer's mesenchymal and immunoreactive subtypes, as well as a statistic that ranks them according to differential expression. Permutations are used in GSEA to compensate for multiple hypothesis testing.

What is the top gene set returned for the Mesenchymal sub type? What is the top gene set returned for the Immunoreactive subtype? For each of the genesets answer the below questions:

Mesenchymal:
pathway: HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION
p-value: 0.00115
ES: 0.864
NES: 2.57
FDR: 0.0219
Number of genes in leading edge: 82
Top gene: FBN1

Immuno:
pathway: HALLMARK_INTERFERON_ALPHA_RESPONSE
p-value: 0.00439
ES: -0.857
NES: -2.88
FDR: -2.98
Number of genes in leading edge: 58
Top gene: GBP4

Activates & Tasks

Given the ranked list comparing mesenchymal and immunoreactive ovarian cancer subtypes(mesenchymal genes have positive scores, immunoreactive have negative scores). perform a GSEA pre-ranked analysis using the following parameters:

download mesenchymal vs immuno rank file genesets from the baderlab geneset collection from March 1, 2021 containing GO biological process, no IEA and pathways. maximum geneset size of 200 minimum geneset size of 15 gene set permutation

Conclusion, Outlook, & Discussion

The results of our GSEA are consistent with what we would predict from the two ovarian cancer subtypes. Increased synthesis of extracellular matrix components characterizes the epithelial-mesenchymal transition (EMT) pathway. GBP4, or Guanylate Binding Protein 4, is the leading gene in the immunoreactive subtype.

Rather than utilizing the original GSEA approach provided in class, I elected to utilize the fgsea R package. I discovered that the number of permutations utilized had an impact on the GSEA findings. I am more confident in the p-value from the 1000 permutations test because as we know, more permutations are intended to provide more reliable findings. There is a direct correlation, due to it being related to the null hypothesis testing method.

References

Subramanian, A., Tamayo, P., Mootha, V., Mukherjee, S., Ebert, B., & Gillette, M. et al. (2005). Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proceedings Of The National Academy Of Sciences, 102(43), 15545-15550. doi: 10.1073/pnas.0506580102

Mootha, V., Lindgren, C., Eriksson, KF. et al. PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet 34, 267–273 (2003). https://doi.org/10.1038/ng1180

G. Korotkevich, V. Sukhov, A. Sergushichev. Fast gene set enrichment analysis. bioRxiv (2019), doi:10.1101/060012

Kalluri R, Neilson EG. Epithelial-mesenchymal transition and its implications for fibrosis. J Clin Invest. 2003 Dec;112(12):1776-84. doi: 10.1172/JCI20530. PMID: 14679171; PMCID: PMC297008.

⚠️ **GitHub.com Fallback** ⚠️