Assignment #3 - bcb420-2023/Helena_Jovic GitHub Wiki
Objectives
- Perform non-thresholded gene set enrichment analysis to identify biological pathways and functions that are significantly enriched in ranked set of genes from A2.
- Visualize your Gene set Enrichment Analysis in Cytoscape using an enrichment map with annotations and appropriates figures.
- Answers questions relating results back to the initial data and question.
Time Management
Date Started: March 25, 2023
Date Completed:
Estimated Time: 15 hours
Actual Time: 25 hours
Introduction
In A1, I sourced, cleaned and normalized dataset GSE184320. This data is used in the study "Loss of skin and mucosal CXCR3+ resident memory T cells causes irreversible tissue-confined immunodeficiency in HIV". This data Includes 16 CD45+ cell samples sorted by type of tissue: skin and peripheral blood mononuclear cells. After cleaning, mapping and normalizing the data, 20% of the original data set remains for a total of 11895 genes.
In A2, identified the pathways that are linked with genes that are significantly upregulated or downregulated in people living with HIV using data in the study "Loss of skin and mucosal CXCR3+ resident memory T cells causes irreversible tissue-confined immunodeficiency in HIV". Performed differential gene expression analysis comparing different tissue samples. Performed ORA on upregulated genes and downregulated genes.
- With a p-value cutoff of 0.01, 2632 genes were found to be differentially expressed. After correction with the BH method at the same cutoff, 1795 genes remained.
*For upregulated genes, the domain size was 445. Upregulated genes were involved key processes such as "epidermis development" and "skin development" and "extracellular region".
*For downregualted genes, the domain size was 1465. Downregulated genes were involved in "adaptive immune response", "regulation of immune system process", and "leukocyte activation".
Workflow
Non-threshold Gene set Enrichment Analysis
- Downloaded "Human_GOBP_AllPathways_with_GO_iea_March_02_2023_symbol.gmt" from the Bader Lab collection found at http://download.baderlab.org/EM_Genesets/current_release/Human/symbol/
- Loaded gene ranking file from Assignment 2. I had to do some adjustments to the code, such as re-matching the EnsemblIDs with their HUGO gene names exporting the data as a rank/csv file.
- Performed a GSEA Pre-Ranked Analysis using GSEA 4.3.2 using default geneset size of 15-500 with 1000 permutations and gene symbols set to "No_Collapse."
Visualization in Cytoscape
- Initiated enrichment map visualization through GSEA to make an unmodified enrichment map
- Annotated the network using the AutoAnnotate Cytoscape App and selected layout network to prevent cluster overlap.
- Created a publication ready figure by manually adding a legend, this enrichment map seemed already to be sorted into a theme network.
Issues and Resolutions
In GSEA there were 30515 row(s) in total of missing data in this RNK file. These will be ignored because generally this is okay.
GSEA Enrichment Results
POS top term:
- ORGANIC SUBSTANCE METABOLIC PROCESS%GOBP%go:0071704
- size: 483; ES: 0.19; NES: 4.24; NOM p-val: 0.000; FDR q-val: 0.000; FWER p-val: 0.000; rank at max: 863, leading edge: tags = 63%, list= 49%, signal = 89%.
NEG top term:
- SCAVENGING OF HEME FROM PLASMA%REACTOME%R-HSA-2168880.1
- size: 40; ES: -0.81; NES: -6.03; NOM p-val: 0.000; FDR q-val: 0.000; FWER p-val: 0.000; rank at max: 244, leading edge: tags = 93%, list= 14%, signal = 105%.
Enrichment in phenotype na_pos
- 1152 / 1376 gene sets are upregulated in phenotype na_pos
- 600 gene sets are significant at FDR < 25%
- 220 gene sets are significantly enriched at nominal pvalue < 1%
- 387 gene sets are significantly enriched at nominal pvalue < 5%
Enrichment in phenotype na_neg
- 224 / 1376 gene sets are upregulated in phenotype na_neg
- na_neg 78 gene sets are significantly enriched at FDR < 25%
- 50 gene sets are significantly enriched at nominal pvalue < 1%
- 62 gene sets are significantly enriched at nominal pvalue < 5%
- The dataset has 1776 features (genes) No probe set => gene symbol collapsing was requested, so all 1776 features were used Gene set details
Gene set details
- Gene set size filters (min=15, max=500) resulted in filtering out 18385 / 19761 gene sets
- There were 30515 row(s) in total of missing data in this RNK file. These will be ignored.
Visualization in Cytoscape
Major themes
- Regulation transport process
- Processing presentation antigen
- Vesicle mediated actiavtion
- Modification process metabolic
- Phase g1 mitotic
Conclusions
The enrichment results do support conclusions discussed in the original paper.
References
- Isserlin, Ruth. (2023). Week 10 GSEA. University of Toronto.
- Isserlin, Ruth. (2023). Week 11 Cytoscape and Networks and Enrichment Mapping. University of Toronto.
- Reimand J, Isserlin R, Voisin V, Kucera M, Tannus-Lopes C, Rostamianfar A, Wadi L, Meyer M, Wong J, Xu C, Merico D, Bader GD. Pathway enrichment analysis and visualization of omics data using g:Profiler, GSEA, Cytoscape and EnrichmentMap. Nat Protoc. 2019 Feb;14(2):482-517 PubmedLinks to an external site.
- Merico D, Isserlin R, Stueker O, Emili A, Bader GD Enrichment Map: A Network-Based Method for Gene-Set Enrichment Visualization and Interpretation PLoS One. 2010 Nov 15;5(11) T
- Subramanian, A., Tamayo, P., et al. (2005, PNAS). Mootha, V. K., Lindgren, C. M., et al. (2003, Nature Genetics). For use of the Molecular Signatures Database (MSigDB), to cite please reference one or more of the following as appropriate, along with the source for the gene set as listed on the gene set page:
- Liberzon A, et al. (Bioinformatics, 2011). Liberzon A, et al. (Cell Systems 2015).
- Saluzzo S, Pandey RV, Gail LM, Dingelmaier-Hovorka R et al. Delayed antiretroviral therapy in HIV-infected individuals leads to irreversible depletion of skin- and mucosa-resident memory T cells. Immunity 2021 Dec 14;54(12):2842-2858.e5. PMID: 34813775