MSigDBR - MattHuff/SingleCellDocumentation_112023 GitHub Wiki

After you have found your list of significant, differentially expressed genes, you can run analyses to contextualize your results. In this pipeline, I run two of these analyses: pathway analysis and gene ontology enrichment analysis.

Pathway analysis takes a list of DE genes and predicts which pathways are likely to be associated based on your input. This strategy uses the Molecular Signature Database R package (MSigDBR) to load a database of mouse pathways, followed by Cluster Profilers' Enricher command to

Load Libraries, Set Working Directory, and Load DEG List

# Install Dependencies
library(CellChat)
library(circlize)
library(ComplexHeatmap)
library(NMF)
library(patchwork)
library(msigdbr)
library(clusterProfiler)

# Set Working Directory
setwd("~/Downloads/seurat_112930/")

# Load Input File
combined_df_sign <- read.table(file = "katy_samples/de_out/Norris_Dge_FindMarkersMAST_Sign.txt", 
                               sep = "\t", header = TRUE)

Set-Up Pathway Analysis

# Set-up directories
pviz_dir <- "katy_samples/de_out/Pathway_enrichment/"
if(!dir.exists(pviz_dir)){dir.create(pviz_dir,recursive = T)}

# Set up DB
mouse_db <- msigdbr(species = "Mus musculus", category = "C8")
msigdbr_t2g <- mouse_db %>%
  dplyr::distinct(gs_name, gene_symbol) %>%
  as.data.frame()

Run Pathway Enricher

cellTypes <- unique(combined_df_sign$celltype)
for (cell in cellTypes) {
  cur_df <- subset(combined_df_sign, celltype == cell)
  
  gene_vector <- cur_df$genes
  
  overRep <- enricher(gene = gene_vector, TERM2GENE = msigdbr_t2g)
  r1 <- dotplot(overRep, showCategory=10) + ggtitle(paste0("Mouse Celltype: ", cell))
  ggsave(filename = paste0(pviz_dir,"GeneRatio_OverRep_", cell, ".pdf"), width = 12)
  
  #gSea <- GSEA(gene = gene_vector, TERM2GENE = msigdbr_t2g)
  #r2 <- dotplot(gSea, showCategory=10) + ggtitle(paste0("Mouse Celltype: ", cell))
  #ggsave(filename = paste0(pviz_dir,"GeneRatio_GSEA_", cell, ".pdf"), width = 12)
}