MSigDBR - MattHuff/SingleCellDocumentation_112023 GitHub Wiki
After you have found your list of significant, differentially expressed genes, you can run analyses to contextualize your results. In this pipeline, I run two of these analyses: pathway analysis and gene ontology enrichment analysis.
Pathway analysis takes a list of DE genes and predicts which pathways are likely to be associated based on your input. This strategy uses the Molecular Signature Database R package (MSigDBR) to load a database of mouse pathways, followed by Cluster Profilers' Enricher command to
Load Libraries, Set Working Directory, and Load DEG List
# Install Dependencies
library(CellChat)
library(circlize)
library(ComplexHeatmap)
library(NMF)
library(patchwork)
library(msigdbr)
library(clusterProfiler)
# Set Working Directory
setwd("~/Downloads/seurat_112930/")
# Load Input File
combined_df_sign <- read.table(file = "katy_samples/de_out/Norris_Dge_FindMarkersMAST_Sign.txt",
sep = "\t", header = TRUE)
Set-Up Pathway Analysis
# Set-up directories
pviz_dir <- "katy_samples/de_out/Pathway_enrichment/"
if(!dir.exists(pviz_dir)){dir.create(pviz_dir,recursive = T)}
# Set up DB
mouse_db <- msigdbr(species = "Mus musculus", category = "C8")
msigdbr_t2g <- mouse_db %>%
dplyr::distinct(gs_name, gene_symbol) %>%
as.data.frame()
Run Pathway Enricher
cellTypes <- unique(combined_df_sign$celltype)
for (cell in cellTypes) {
cur_df <- subset(combined_df_sign, celltype == cell)
gene_vector <- cur_df$genes
overRep <- enricher(gene = gene_vector, TERM2GENE = msigdbr_t2g)
r1 <- dotplot(overRep, showCategory=10) + ggtitle(paste0("Mouse Celltype: ", cell))
ggsave(filename = paste0(pviz_dir,"GeneRatio_OverRep_", cell, ".pdf"), width = 12)
#gSea <- GSEA(gene = gene_vector, TERM2GENE = msigdbr_t2g)
#r2 <- dotplot(gSea, showCategory=10) + ggtitle(paste0("Mouse Celltype: ", cell))
#ggsave(filename = paste0(pviz_dir,"GeneRatio_GSEA_", cell, ".pdf"), width = 12)
}