Mouse brain Curio Seeker data analysis workflow - AThennavan/CSHL_workshops GitHub Wiki
title: "Mouse Curio Spatial Analysis Workflow" author: "Navin Lab: Aatish Thennavan & Ping Xu" date: "2024-06-25" output: html_document: default pdf_document: default
In this part, we will focus on the Curio Mouse Brain data, which is generated with the Curio Seeker platform. The Curio Seeker platform is designed for high-resolution (10μm) spatial transcriptomics, providing detailed insights into gene expression patterns within tissue samples.
Working with high-resolution data in spatial transcriptomics can be computationally intensive and time-consuming. To manage this, we will follow these steps:
1. Filtering:
The Curio Seeker platform is a sequence-based method, so areas without tissue coverage are still retained.
We can check the distribution of nFeature_RNA
, nCount_RNA
and log10GenesPerUMI
to filter and keep only biologically meaningful areas. Feel free to try different thresholds, as they may vary between tissues.
2. Normalization:
Similar to the processes used in scRNA (single-cell level) and Visium data (55μm), normalization and scaling are essential for preparing the data for downstream analyses.
In this tutorial, we use standard log-normalization for spatial data. Feel free to try other methods for normalization.
Other references:
- Seurat Visium HD Analysis Vignette
- Curio Mouse Brain Data
knitr::opts_chunk$set(echo = TRUE,fig.height=4,fig.width=6)
# Define the package install path
.libPaths("/volumes/USR1/pingxu/CSHL_course_2024_Summer/R_Pacakges/")
library(klippy)
klippy::klippy('')
# Load necessary packages
suppressPackageStartupMessages({
library(SeuratObject)
library(Seurat)
library(SeuratWrappers)
library(patchwork)
library(RColorBrewer)
library(dplyr)
library(ggplot2)
})
# Check Seurat version
packageVersion("Seurat")
## Set Working Directory
your_working_dir <- "/volumes/USR1/pingxu/CSHL_course_2024_Summer/Spatial_transcriptomics/"
setwd(your_working_dir)
# Load Curio Seeker data
Brain_10X10 <- readRDS("Mouse_brain_10by10_seurat.rds")
# Check the Brain_10X10 details
Brain_10X10
head([email protected])
# Visualize the number of transcripts per cell
[email protected] %>%
ggplot(aes( x=nCount_RNA, fill= 'orig.ident')) +
geom_density(alpha = 0.2) +
scale_x_log10() +
theme_classic() +
ylab("Cell density") +
geom_vline(xintercept = 50)
# Visualize the distribution of genes detected per cell
[email protected] %>%
ggplot(aes( x=nFeature_RNA, fill= 'orig.ident')) +
geom_density(alpha = 0.2) +
scale_x_log10() +
theme_classic() +
ylab("Cell density") +
geom_vline(xintercept = 50)
# Add number of genes per UMI for each cell to metadata
Brain_10X10$log10GenesPerUMI <- log10(Brain_10X10$nFeature_RNA) / log10(Brain_10X10$nCount_RNA)
# Visualize the overall complexity of the gene expression by visualizing the genes detected per UMI
[email protected] %>%
ggplot(aes(x=log10GenesPerUMI, fill= 'orig.ident')) +
geom_density(alpha = 0.2) +
theme_classic() +
ylab("Cell density") +
geom_vline(xintercept = 0.80)
# Subset
filtered_Brain <- subset(x = Brain_10X10,
subset= (nCount_RNA >= 50) &
(nFeature_RNA >= 50) &
(log10GenesPerUMI > 0.80))
# Check SpatialFeaturePlot for log(nFeature_RNA) and log(nCount_RNA)
SpatialFeaturePlot(filtered_Brain, features = c('log10_nFeature_RNA','log10_nCount_RNA'), pt.size.factor = 0.5)


Utilize the sketch
Function:
The sketch
function generates a "sketch" of the dataset that retains the overall structure and main features while being smaller and easier to handle.
Initial Data Exploration: Users can create a sketch of the dataset to quickly generate plots and perform initial analyses.
Efficiency: By using a smaller subset of the data, users can test different parameters and analysis pipelines more efficiently before applying them to the full dataset.
Accuracy: As described in Hao et al. and Hie et al., sketch-based analyses aim to 'subsample' large datasets in a way that preserves rare populations.
# Normalization
Assays(filtered_Brain)
DefaultAssay(filtered_Brain) <- "RNA"
filtered_Brain <- NormalizeData(filtered_Brain)
filtered_Brain <- FindVariableFeatures(filtered_Brain)
filtered_Brain <- ScaleData(filtered_Brain)
# Utilize the `sketch` Function:
# We select 30,0000 cells and create a new 'sketch' assay
filtered_Brain <- SketchData(
object = filtered_Brain,
ncells = 30000,
method = "LeverageScore",
sketched.assay = "sketch"
)
# Switch analysis to sketched cells
DefaultAssay(filtered_Brain) <- "sketch"
# Perform clustering workflow
filtered_Brain <- FindVariableFeatures(filtered_Brain)
filtered_Brain <- ScaleData(filtered_Brain)
filtered_Brain <- RunPCA(filtered_Brain, assay = "sketch", reduction.name = "pca.sketch")
# Elbow Plot
ElbowPlot(filtered_Brain, ndims = 50)
Projection:
The cluster labels and dimensional reductions (PCA and UMAP) learned from the 30,000 sketched cells can be projected onto the entire dataset using the ProjectData
function.
# Get UMAP in 'sketch' assay
filtered_Brain <- FindNeighbors(filtered_Brain, assay = "sketch", reduction = "pca.sketch", dims = 1:25)
filtered_Brain <- FindClusters(filtered_Brain, cluster.name = "seurat_cluster.sketched", resolution = 1)
filtered_Brain <- RunUMAP(filtered_Brain, reduction = "pca.sketch", reduction.name = "umap.sketch", return.model = T, dims = 1:25)
# Projection
filtered_Brain <- ProjectData(
object = filtered_Brain,
assay = "RNA",
full.reduction = "full.pca.sketch",
sketched.assay = "sketch",
sketched.reduction = "pca.sketch",
umap.model = "umap.sketch",
dims = 1:50,
refdata = list(seurat_cluster.projected = "seurat_cluster.sketched")
)
# Save the rds file
saveRDS(filtered_Brain, file = 'filtered_Brain_sketched.rds')
In this part, we will analyze single cell and spatial regions of biological relevance through clustering and annotations
3. Find Gene Expression Markers for Each Cluster:
Identify markers that distinguish different cell clusters within the dataset. In this tutorial, we create a new object with a downsampled set of 1,000 cells to quickly check the marker genes. Feel free to find marker genes in the full dataset as well.
4. Identify Spatially-Defined Tissue Domains:
BANKSY (Bayesian Annotation of Spatial Transcriptomics with Sequences) is a package used to identify and annotate spatial domains within tissue samples.
SpatialPCA is a spatially aware dimension reduction method that aims to infer a low dimensional representation of the gene expression data in spatial transcriptomics.
library(dygraphs)
library(flexdashboard)
knitr::opts_chunk$set(
cache = TRUE,
cache.lazy = FALSE,
tidy = TRUE,
autodep=TRUE
)
# Define the package install path
library(klippy)
klippy::klippy('')
# Load necessary packages
suppressPackageStartupMessages({
library(knitr)
library(SeuratObject)
library(Seurat)
library(SeuratWrappers)
library(patchwork)
library(RColorBrewer)
library(dplyr)
library(ggplot2)
library(Banksy) #For R version 4.4
library(SpatialPCA)
library(spacexr)
})
# Check Seurat version (This tutorial will require Seurat version 5 and R version 4.4)
packageVersion("Seurat")
## Set Working Directory
your_working_dir <- "/volumes/USR1/pingxu/CSHL_course_2024_Summer/Spatial_transcriptomics/"
setwd(your_working_dir)
# Read Curio Seeker data created from the previous tutorial
filtered_Brain_sketched <- readRDS("/volumes/USR1/pingxu/CSHL_course_2024_Summer/Spatial_transcriptomics/filtered_Brain_sketched.rds")
# Downsample the Brain_10x10 to make downstream analysis faster
object_subset <- subset(filtered_Brain_sketched, cells = Cells(filtered_Brain_sketched[["sketch"]]), downsample = 1000)
# Compare each cell type vs all of the rest
markers <- FindAllMarkers(object_subset, assay = "sketch", only.pos = TRUE)
# Plotting gene markers for each cluster
DefaultAssay(object_subset) <- "sketch"
Idents(object_subset) <- "seurat_cluster.projected"
markers %>%
group_by(cluster) %>%
dplyr::filter(avg_log2FC > 1) %>%
slice_head(n = 5) %>%
ungroup() -> top5
object_subset <- ScaleData(object_subset, assay = "sketch", features = top5$gene)
p <- DoHeatmap(object_subset, assay = "sketch", features = top5$gene, size = 2.5) + theme(axis.text = element_text(size = 5.5)) + NoLegend()
p
#We can order clusters by similarity and then also plot the heatmap
object_subset <- BuildClusterTree(object_subset, assay = "sketch", reduction = "full.pca.sketch", reorder = T)
p <- DoHeatmap(object_subset, assay = "sketch", features = top5$gene, size = 2.5) + theme(axis.text = element_text(size = 5.5)) + NoLegend()
p
# Highlighting specific cell types based on specific marker/ canonical genes
# A curated list of relevant brain cell types and the genes that define them
# Neural
Neural_genes <- c("Cadm2","Pcdh9","Syt1","Dlg2","Dlgap1","Negr1", "Fgf14")
# Oligodendrocyte
Oligo_genes <- c("Mbp","Plp1","Cnp","Mag","Mobp", "Olig1", "Mal")
# Astrocytes
Astro_genes <- c("Slc39a12","Ntsr2","Fgfr3", "Bcan", "Aqp4", "Gfap")
# Microglial
Microglial_genes <- c("Hexb","C1qa","C1qb", "C1qc", "Csf1r", "Ctss")
# Purkinje
Purkinje_genes <- c("Auts2","Foxp2","Ebf1","Grid2", "Dner", "Khdrsb2")
# Ependymal
Ependymal_genes <- c("Dynlrb2","Tmem212","Rsph1","Foxj1", "Rarres2", "Tmem47")
# Hypendymal
Hypendymal_genes <- c("Rmst","Epha7","Zfhx4", "Efnb3", "Zic4", "Wls")
# Choroid
Choroid_genes <- c("Rbm47","Htr2c","Enpp2", "Ttr", "Fras1", "Glis3")
#Endothelial Mural
Mural_genes <- c("Adap2","Pdgfrb","Pde8b", "Myo1b", "Rbpms", "Dlc1")
#Endothelial Stalk
Stalk_genes <- c("Slco1a4", "Adgrl4", "Abcg2", "Nostrin", "Cldn5", "Ptprb")
# Granular neuron types - Location (Suggest to just use Cerbral cortex, thalamus and hypothalamus markers)
# Cerebellar granule cell - Cerebellum
CBgranular_genes <- c("Kcnd2","Grik2","Fgf14", "Chn2", "Tenm1", "Olfm3", "Astn2") #Not present in this section
# Cerebellar golgi cell - Cerebellum
CBgolgi_genes <- c("Sgcd","Lgi2","Megf11", "Elavl2", "Cacna1d", "Fgf13", "Cdh4") #Not present in this section
# Unipolar brush cell - Cerebral cortex and Cochlear nucleus
CTX_CN_UBC_genes <- c("Unc5c","Tmem108","Nnat","Marchf1","Ccser1", "Sorbs2")
#Cerebral cortex neuron
CTXneuron_genes <- c("Vamp2","Selenow","Grin2b","Snap25","Atp6v1g2","Norad", "Syp")
#Hippocamal neuron
HIPneuron_genes <- c("Ncdn","Ppfia2","Rfx3", "Ahcyl2", "Epha7")
#Cortical Interneuron
CTXInterneuron_genes <- c("Galntl6","Gad2","Nap1l5", "Grip1", "Tspyl4", "Cplx1")
#Medium Spiny neuron
CTX_TH_MSN_genes <- c("Rarb","Snhg14","Rian", "Rgs9", "Ryr3", "Gnal")
Pallium_spiny_genes <- c("Ppp1r1b")
#Cajal-Retzius cells
CTX_HIP_CRC_genes <- c("Ndnf","Reln","Ramp1", "Thsd7b", "Fat3", "Kcnb2")
#Excitary projection neurons
EP_genes <- c("Tbr1", "Neurod6", "Satb2")
#Hypothalamus
HY_genes <- c("Nkx2-1", "Sim1", "Lhx6", "Lhx8")
#Thalamus
TH_genes <- c("Tcf7l2", "Six3", "Plekhg1")
#Midbrain and Pontine neurons
MBP_genes <- c("Otx2", "Gata3", "Pax5", "Pax7", "Sox14")
# Granular cell types - Neurotransmitters
#GABAergic neurons
GABAergic_genes <- c("Slc32a1","Gabra1","Gad2","Galntl6", "Grip1", "Gbx1")
#Glutamergic neurons
Glutamate_genes <- c("Slc17a6", "Slc17a7", "Slc17a8", "Arpp21","Car10","Sv2b","Camk4", "Pcsk2", "Brinp1")
#Cholinergic neurons
Choline_genes <- c("Chat","Slc18a3","Tbx20","Chrnb3","Nppb","Ush1c", "Npy4r")
#Dopaminergic neurons
Dopa_genes <- c("Slc6a3", "Chrna6", "Gucy2c", "Aldh1a7", "Tdh")
#Serotonin neurons
Sero_genes <- c("Slc6a4","Tph2") #not in this section
#Noradrenergic neurons
Noradren_genes <- c("Slc6a2", "Dbh") #not in this section
#Plot examples of the canonical gene expression
#Oligodendrocytes (cluster 3 and 5)
VlnPlot(object_subset, features=Oligo_genes, group.by = "seurat_cluster.projected",pt.size = 0)
#Microglial cells (cluster 21)
VlnPlot(object_subset, features=Microglial_genes, group.by = "seurat_cluster.projected",pt.size = 0)
#Oligodendrocyte markers expression across the spatial clusters
#Microglial markers expression across the spatial clusters
#Spatially plotting oligodendrocytes (cluster 5), astrocyte (cluster 13), ependymal (cluster 22), microglial cells (cluster 21), Cortex layer GLUTamergic excitory 5 (cluster 25) and Cortex layer 6 GLUTamergic excitory (cluster 31)
Idents(object_subset) <- "seurat_cluster.projected"
cells <- CellsByIdentities(object_subset, idents = c(5, 13, 22, 21, 25, 31))
p <- SpatialDimPlot(object_subset,
cells.highlight = cells[setdiff(names(cells), "NA")],
cols.highlight = c("#FFFF00", "grey50"), facet.highlight = T, combine = T
) + NoLegend()
p
#You can also do the same plotting in the full dataset
This methodology utilizes Robust Cell Type Deconvolution (RCTD) to identify cell types in the spatial dataset using a single cell dataset as reference. In this analysis, we will apply RCTD to the sketch downsampled data and then project it onto the main dataset containing all the cell types [Cable, D.M., Murray, E., Zou, L.S. et al. Robust decomposition of cell type mixtures in spatial transcriptomics. Nat Biotechnol 40, 517–526 (2022). https://doi.org/10.1038/s41587-021-00830-w)
The single cell dataset we are using as reference is a mouse scRNAseq dataset obtained from the Allen Brain Atlas (present in the data tab)
#Read in the scRNAseq reference dataset
ref <- readRDS("~/Projects/Curiodata/allen_scRNAseq_ref.Rds")
# Create the RCTD reference object
Idents(ref) <- "subclass_label"
counts <- ref[["RNA"]]$counts
cluster <- as.factor(ref$subclass_label)
nUMI <- ref$nCount_RNA
levels(cluster) <- gsub("/", "-", levels(cluster))
cluster <- droplevels(cluster)
reference <- Reference(counts, cluster, nUMI)
# Create the RCTD reference object
reference <- Reference(counts, cluster, nUMI)
counts_hd <- object_subset[["sketch"]]$counts
cortex_cells_hd <- colnames(object_subset[["sketch"]])
coords <- GetTissueCoordinates(object_subset)[cortex_cells_hd, 1:2]
query <- SpatialRNA(coords, counts_hd, colSums(counts_hd))
#Run RCTD analysis
RCTD <- create.RCTD(query, reference, max_cores = 28)
RCTD <- run.RCTD(RCTD, doublet_mode = "doublet")
#Add the cell type annotations to the spatial dataset
object_subset <- AddMetaData(object_subset, metadata = RCTD@results$results_df)
object_subset$first_type <- as.character(object_subset$first_type)
object_subset$first_type[is.na(object_subset$first_type)] <- "Unknown"
#Visualizing cell identities in the spatial locations e.g. Layered pallium (starts with L), excitatory neurons in the cortex
Idents(object_subset) <- "first_type"
cells <- CellsByIdentities(object_subset)
excitatory_names <- sort(grep("^L.* CTX", names(cells), value = TRUE))
p <- SpatialDimPlot(object_subset, cells.highlight = cells[excitatory_names], cols.highlight = c("#FFFF00", "grey50"), facet.highlight = T, combine = T, ncol = 4)
p
In spatial data, we can plot biologically relevant spatial neighbourhoods using the mean and the gradient of gene expression levels in a spot’s broader neighborhood. This methodology utilizes Building Aggregates with a Neighborhood Kernel and Spatial Yardstick (BANKSY) [Singhal, V., Chou, N., Lee, J. et al. BANKSY unifies cell typing and tissue domain segmentation for scalable spatial omics data analysis. Nat Genet 56, 431–441 (2024). https://doi.org/10.1038/s41588-024-01664-3]
#Running BANKSY to produce new clusterings based on spot neighborhoods
object_subset <- RunBanksy(object_subset,
lambda = 0.8, verbose = TRUE,
assay = "sketch", slot = "data", features = "variable",
k_geom = 50
) #You can try different parameters of lambda and k_geom
object_subset <- RunPCA(object_subset, assay = "BANKSY", reduction.name = "pca.banksy", features = rownames(object_subset), npcs = 30)
object_subset <- FindNeighbors(object_subset, reduction = "pca.banksy", dims = 1:30)
object_subset <- FindClusters(object_subset, cluster.name = "banksy_cluster", resolution = 0.5) #You can try different parameters of resolution
#Plotting the BANKSY defined neighborhoods
Idents(object_subset) <- "banksy_cluster"
p <- SpatialDimPlot(object_subset, group.by = "banksy_cluster", label = T, repel = T, label.size = 4)
p
banksy_cells <- CellsByIdentities(object_subset)
p <- SpatialDimPlot(object_subset, cells.highlight = banksy_cells[setdiff(names(banksy_cells), "NA")], cols.highlight = c("#FFFF00", "grey50"), facet.highlight = T, combine = T) + NoLegend()
p
SpatialPCA is a spatially aware dimension reduction method that aims to infer a low dimensional representation of the gene expression data in spatial transcriptomics. SpatialPCA builds upon the probabilistic version of PCA, incorporates localization information as additional input, and uses a kernel matrix to explicitly model the spatial correlation structure across tissue locations.[Shang, L., Zhou, X. Spatially aware dimension reduction for spatial transcriptomics. Nat Commun 13, 7203 (2022). https://doi.org/10.1038/s41467-022-34879-1]
#Converting Seurat to SpatialPCA object
Location<-as.matrix.data.frame(object_subset@images$slice1@coordinates)
Locationsubset<-Location[,1:2]
Locationsubset<-as.data.frame(Locationsubset)
Locationsubset$x<-as.numeric(Locationsubset$x)
Locationsubset$y<-as.numeric(Locationsubset$y)
Locationsubset<-as.matrix.data.frame(Locationsubset)
slideseqv2 = CreateSpatialPCAObject(counts=object_subset@assays$RNA$counts, location=Locationsubset, project = "SpatialPCA",gene.type="spatial",sparkversion="sparkx",numCores_spark=10, gene.number=3000, customGenelist=NULL,min.loctions = 20, min.features=20)
#Estimate Spatial PCs
slideseqv2 = SpatialPCA_buildKernel(slideseqv2, kerneltype="gaussian", bandwidthtype="Silverman",bandwidth.set.by.user=NULL,sparseKernel=TRUE,sparseKernel_tol=1e-20,sparseKernel_ncore=10)
slideseqv2 = SpatialPCA_EstimateLoading(slideseqv2,fast=TRUE,SpatialPCnum=20)
slideseqv2 = SpatialPCA_SpatialPCs(slideseqv2, fast=TRUE)
#Detect Spatial Domains and visualize
clusterlabel= louvain_clustering(clusternum=14,latent_dat=slideseqv2@SpatialPCs,knearest=310)
plot_cluster(location=object_subset@location,clusterlabel,pointsize=0.5,title_in=paste0("SpatialPCA"),color_in=cbp,legend="bottom")
###Reference - https://satijalab.org/seurat/articles/visiumhd_analysis_vignette https://lulushang.org/SpatialPCA_Tutorial/index.html