Mouse brain Curio Seeker data analysis workflow - AThennavan/CSHL_workshops GitHub Wiki


title: "Mouse Curio Spatial Analysis Workflow" author: "Navin Lab: Aatish Thennavan & Ping Xu" date: "2024-06-25" output: html_document: default pdf_document: default

Curio Mouse Brain Data

In this part, we will focus on the Curio Mouse Brain data, which is generated with the Curio Seeker platform. The Curio Seeker platform is designed for high-resolution (10μm) spatial transcriptomics, providing detailed insights into gene expression patterns within tissue samples.

Working with high-resolution data in spatial transcriptomics can be computationally intensive and time-consuming. To manage this, we will follow these steps:

1. Filtering:
The Curio Seeker platform is a sequence-based method, so areas without tissue coverage are still retained.
We can check the distribution of nFeature_RNA, nCount_RNA and log10GenesPerUMI to filter and keep only biologically meaningful areas. Feel free to try different thresholds, as they may vary between tissues.

2. Normalization:
Similar to the processes used in scRNA (single-cell level) and Visium data (55μm), normalization and scaling are essential for preparing the data for downstream analyses.
In this tutorial, we use standard log-normalization for spatial data. Feel free to try other methods for normalization.

Other references:
- Seurat Visium HD Analysis Vignette
- Curio Mouse Brain Data

knitr::opts_chunk$set(echo = TRUE,fig.height=4,fig.width=6)
# Define the package install path
.libPaths("/volumes/USR1/pingxu/CSHL_course_2024_Summer/R_Pacakges/")
library(klippy)
klippy::klippy('')
# Load necessary packages 
suppressPackageStartupMessages({
 library(SeuratObject)
 library(Seurat)
 library(SeuratWrappers)
 library(patchwork)  
 library(RColorBrewer) 
 library(dplyr)  
 library(ggplot2)
})
# Check Seurat version
packageVersion("Seurat")

## Set Working Directory
your_working_dir <- "/volumes/USR1/pingxu/CSHL_course_2024_Summer/Spatial_transcriptomics/"
setwd(your_working_dir)
# Load Curio Seeker data
Brain_10X10 <- readRDS("Mouse_brain_10by10_seurat.rds") 
# Check the Brain_10X10 details
Brain_10X10
head([email protected])

Filtering

# Visualize the number of transcripts per cell
[email protected] %>% 
  ggplot(aes( x=nCount_RNA, fill= 'orig.ident')) + 
  geom_density(alpha = 0.2) + 
  scale_x_log10() + 
  theme_classic() +
  ylab("Cell density") +
  geom_vline(xintercept = 50)
# Visualize the distribution of genes detected per cell
[email protected]  %>% 
  	 ggplot(aes( x=nFeature_RNA, fill= 'orig.ident')) + 
  geom_density(alpha = 0.2) + 
  scale_x_log10() + 
  theme_classic() +
  ylab("Cell density") +
  geom_vline(xintercept = 50)
# Add number of genes per UMI for each cell to metadata
Brain_10X10$log10GenesPerUMI <- log10(Brain_10X10$nFeature_RNA) / log10(Brain_10X10$nCount_RNA)
# Visualize the overall complexity of the gene expression by visualizing the genes detected per UMI
[email protected] %>%
  ggplot(aes(x=log10GenesPerUMI, fill= 'orig.ident')) +
  geom_density(alpha = 0.2) +
  theme_classic() +
  ylab("Cell density") +
  geom_vline(xintercept = 0.80)
# Subset
filtered_Brain <- subset(x = Brain_10X10, 
                          subset= (nCount_RNA >= 50) & 
                            (nFeature_RNA >= 50) & 
                            (log10GenesPerUMI > 0.80))

# Check SpatialFeaturePlot for log(nFeature_RNA) and log(nCount_RNA)
SpatialFeaturePlot(filtered_Brain, features = c('log10_nFeature_RNA','log10_nCount_RNA'), pt.size.factor = 0.5)

Plot1 Plot2

Normalization

Utilize the sketch Function:
The sketch function generates a "sketch" of the dataset that retains the overall structure and main features while being smaller and easier to handle.
Initial Data Exploration: Users can create a sketch of the dataset to quickly generate plots and perform initial analyses.
Efficiency: By using a smaller subset of the data, users can test different parameters and analysis pipelines more efficiently before applying them to the full dataset.
Accuracy: As described in Hao et al. and Hie et al., sketch-based analyses aim to 'subsample' large datasets in a way that preserves rare populations.


# Normalization
Assays(filtered_Brain)
DefaultAssay(filtered_Brain) <- "RNA"
filtered_Brain <- NormalizeData(filtered_Brain)
filtered_Brain <- FindVariableFeatures(filtered_Brain)
filtered_Brain <- ScaleData(filtered_Brain)

# Utilize the `sketch` Function:
# We select 30,0000 cells and create a new 'sketch' assay
filtered_Brain <- SketchData(
  object = filtered_Brain,
  ncells = 30000,
  method = "LeverageScore",
  sketched.assay = "sketch"
)

# Switch analysis to sketched cells
DefaultAssay(filtered_Brain) <- "sketch"

# Perform clustering workflow
filtered_Brain <- FindVariableFeatures(filtered_Brain)
filtered_Brain <- ScaleData(filtered_Brain)
filtered_Brain <- RunPCA(filtered_Brain, assay = "sketch", reduction.name = "pca.sketch")
# Elbow Plot
ElbowPlot(filtered_Brain, ndims = 50) 

ElbowPlot_filtered_Brain

Projection:
The cluster labels and dimensional reductions (PCA and UMAP) learned from the 30,000 sketched cells can be projected onto the entire dataset using the ProjectData function.

# Get UMAP in 'sketch' assay
filtered_Brain <- FindNeighbors(filtered_Brain, assay = "sketch", reduction = "pca.sketch", dims = 1:25)
filtered_Brain <- FindClusters(filtered_Brain, cluster.name = "seurat_cluster.sketched", resolution = 1)
filtered_Brain <- RunUMAP(filtered_Brain, reduction = "pca.sketch", reduction.name = "umap.sketch", return.model = T, dims = 1:25)
# Projection
filtered_Brain <- ProjectData(
  object = filtered_Brain,
  assay = "RNA",
  full.reduction = "full.pca.sketch",
  sketched.assay = "sketch",
  sketched.reduction = "pca.sketch",
  umap.model = "umap.sketch",
  dims = 1:50,
  refdata = list(seurat_cluster.projected = "seurat_cluster.sketched")
)
# Save the rds file
saveRDS(filtered_Brain, file = 'filtered_Brain_sketched.rds')

Curio Mouse Brain Data Part 2

In this part, we will analyze single cell and spatial regions of biological relevance through clustering and annotations

3. Find Gene Expression Markers for Each Cluster:
Identify markers that distinguish different cell clusters within the dataset. In this tutorial, we create a new object with a downsampled set of 1,000 cells to quickly check the marker genes. Feel free to find marker genes in the full dataset as well.

4. Identify Spatially-Defined Tissue Domains:
BANKSY (Bayesian Annotation of Spatial Transcriptomics with Sequences) is a package used to identify and annotate spatial domains within tissue samples.
SpatialPCA is a spatially aware dimension reduction method that aims to infer a low dimensional representation of the gene expression data in spatial transcriptomics.

library(dygraphs)
library(flexdashboard)
knitr::opts_chunk$set(
  cache = TRUE,
  cache.lazy = FALSE,
  tidy = TRUE,
  autodep=TRUE
)
# Define the package install path
library(klippy)
klippy::klippy('')
# Load necessary packages 
suppressPackageStartupMessages({
 library(knitr)
 library(SeuratObject)
 library(Seurat)
 library(SeuratWrappers)
 library(patchwork)  
 library(RColorBrewer) 
 library(dplyr)  
 library(ggplot2)
 library(Banksy) #For R version 4.4
 library(SpatialPCA)
 library(spacexr)
})
# Check Seurat version (This tutorial will require Seurat version 5 and R version 4.4)
packageVersion("Seurat")

## Set Working Directory
your_working_dir <- "/volumes/USR1/pingxu/CSHL_course_2024_Summer/Spatial_transcriptomics/"
setwd(your_working_dir)
# Read Curio Seeker data created from the previous tutorial
filtered_Brain_sketched <- readRDS("/volumes/USR1/pingxu/CSHL_course_2024_Summer/Spatial_transcriptomics/filtered_Brain_sketched.rds")

# Downsample the Brain_10x10 to make downstream analysis faster
object_subset <- subset(filtered_Brain_sketched, cells = Cells(filtered_Brain_sketched[["sketch"]]), downsample = 1000)

Identifying top genes for each cluster to identify cell types

# Compare each cell type vs all of the rest
markers <- FindAllMarkers(object_subset, assay = "sketch", only.pos = TRUE)
# Plotting gene markers for each cluster
DefaultAssay(object_subset) <- "sketch"
Idents(object_subset) <- "seurat_cluster.projected"

markers %>%
group_by(cluster) %>%
dplyr::filter(avg_log2FC > 1) %>%
slice_head(n = 5) %>%
ungroup() -> top5
object_subset <- ScaleData(object_subset, assay = "sketch", features = top5$gene)
p <- DoHeatmap(object_subset, assay = "sketch", features = top5$gene, size = 2.5) + theme(axis.text = element_text(size = 5.5)) + NoLegend()
p

#We can order clusters by similarity and then also plot the heatmap
object_subset <- BuildClusterTree(object_subset, assay = "sketch", reduction = "full.pca.sketch", reorder = T)
p <- DoHeatmap(object_subset, assay = "sketch", features = top5$gene, size = 2.5) + theme(axis.text = element_text(size = 5.5)) + NoLegend()
p

Top10heatmap_notordered

# Highlighting specific cell types based on specific marker/ canonical genes
# A curated list of relevant brain cell types and the genes that define them
# Neural
Neural_genes <- c("Cadm2","Pcdh9","Syt1","Dlg2","Dlgap1","Negr1", "Fgf14")
# Oligodendrocyte
Oligo_genes <- c("Mbp","Plp1","Cnp","Mag","Mobp", "Olig1", "Mal") 
# Astrocytes
Astro_genes <- c("Slc39a12","Ntsr2","Fgfr3", "Bcan", "Aqp4", "Gfap") 
# Microglial 
Microglial_genes <- c("Hexb","C1qa","C1qb", "C1qc", "Csf1r", "Ctss") 
# Purkinje
Purkinje_genes <- c("Auts2","Foxp2","Ebf1","Grid2", "Dner", "Khdrsb2") 
# Ependymal
Ependymal_genes <- c("Dynlrb2","Tmem212","Rsph1","Foxj1", "Rarres2", "Tmem47")
# Hypendymal
Hypendymal_genes <- c("Rmst","Epha7","Zfhx4", "Efnb3", "Zic4", "Wls")
# Choroid 
Choroid_genes <- c("Rbm47","Htr2c","Enpp2", "Ttr", "Fras1", "Glis3")
#Endothelial Mural
Mural_genes <- c("Adap2","Pdgfrb","Pde8b", "Myo1b", "Rbpms", "Dlc1")
#Endothelial Stalk
Stalk_genes <- c("Slco1a4", "Adgrl4", "Abcg2", "Nostrin", "Cldn5", "Ptprb")

# Granular neuron types - Location (Suggest to just use Cerbral cortex, thalamus and hypothalamus markers)
# Cerebellar granule cell - Cerebellum
CBgranular_genes <- c("Kcnd2","Grik2","Fgf14", "Chn2", "Tenm1", "Olfm3", "Astn2") #Not present in this section
# Cerebellar golgi cell - Cerebellum
CBgolgi_genes <- c("Sgcd","Lgi2","Megf11", "Elavl2", "Cacna1d", "Fgf13", "Cdh4") #Not present in this section
# Unipolar brush cell - Cerebral cortex and Cochlear nucleus
CTX_CN_UBC_genes <- c("Unc5c","Tmem108","Nnat","Marchf1","Ccser1", "Sorbs2")
#Cerebral cortex neuron
CTXneuron_genes <-  c("Vamp2","Selenow","Grin2b","Snap25","Atp6v1g2","Norad", "Syp")
#Hippocamal neuron
HIPneuron_genes <- c("Ncdn","Ppfia2","Rfx3", "Ahcyl2", "Epha7")
#Cortical Interneuron
CTXInterneuron_genes <- c("Galntl6","Gad2","Nap1l5", "Grip1", "Tspyl4", "Cplx1")
#Medium Spiny neuron
CTX_TH_MSN_genes <- c("Rarb","Snhg14","Rian", "Rgs9", "Ryr3", "Gnal")
Pallium_spiny_genes <- c("Ppp1r1b")
#Cajal-Retzius cells
CTX_HIP_CRC_genes <- c("Ndnf","Reln","Ramp1", "Thsd7b", "Fat3", "Kcnb2")
#Excitary projection neurons
EP_genes <- c("Tbr1", "Neurod6", "Satb2")
#Hypothalamus
HY_genes <- c("Nkx2-1", "Sim1", "Lhx6", "Lhx8")
#Thalamus
TH_genes <- c("Tcf7l2", "Six3", "Plekhg1")
#Midbrain and Pontine neurons
MBP_genes <- c("Otx2", "Gata3", "Pax5", "Pax7", "Sox14")

# Granular cell types - Neurotransmitters
#GABAergic neurons
GABAergic_genes <- c("Slc32a1","Gabra1","Gad2","Galntl6", "Grip1", "Gbx1")
#Glutamergic neurons
Glutamate_genes <- c("Slc17a6", "Slc17a7", "Slc17a8", "Arpp21","Car10","Sv2b","Camk4", "Pcsk2", "Brinp1")
#Cholinergic neurons
Choline_genes <- c("Chat","Slc18a3","Tbx20","Chrnb3","Nppb","Ush1c", "Npy4r")
#Dopaminergic neurons
Dopa_genes <- c("Slc6a3", "Chrna6", "Gucy2c", "Aldh1a7", "Tdh")
#Serotonin neurons
Sero_genes <- c("Slc6a4","Tph2") #not in this section
#Noradrenergic neurons 
Noradren_genes <- c("Slc6a2", "Dbh") #not in this section

#Plot examples of the canonical gene expression
#Oligodendrocytes (cluster 3 and 5)
VlnPlot(object_subset, features=Oligo_genes, group.by = "seurat_cluster.projected",pt.size = 0)
#Microglial cells (cluster 21)
VlnPlot(object_subset, features=Microglial_genes, group.by = "seurat_cluster.projected",pt.size = 0)

#Oligodendrocyte markers expression across the spatial clusters Oligo_markers #Microglial markers expression across the spatial clusters Microglial_genes

#Spatially plotting oligodendrocytes (cluster 5), astrocyte (cluster 13), ependymal (cluster 22), microglial cells (cluster 21), Cortex layer GLUTamergic excitory 5 (cluster 25) and Cortex layer 6 GLUTamergic excitory (cluster 31)
Idents(object_subset) <- "seurat_cluster.projected"
cells <- CellsByIdentities(object_subset, idents = c(5, 13, 22, 21, 25, 31))
p <- SpatialDimPlot(object_subset,
cells.highlight = cells[setdiff(names(cells), "NA")],
cols.highlight = c("#FFFF00", "grey50"), facet.highlight = T, combine = T
) + NoLegend()
p
#You can also do the same plotting in the full dataset

Braincelltype_Spatiallocation

Identifying top genes for each cluster by using a single cell reference dataset

This methodology utilizes Robust Cell Type Deconvolution (RCTD) to identify cell types in the spatial dataset using a single cell dataset as reference. In this analysis, we will apply RCTD to the sketch downsampled data and then project it onto the main dataset containing all the cell types [Cable, D.M., Murray, E., Zou, L.S. et al. Robust decomposition of cell type mixtures in spatial transcriptomics. Nat Biotechnol 40, 517–526 (2022). https://doi.org/10.1038/s41587-021-00830-w)

The single cell dataset we are using as reference is a mouse scRNAseq dataset obtained from the Allen Brain Atlas (present in the data tab)


#Read in the scRNAseq reference dataset
ref <- readRDS("~/Projects/Curiodata/allen_scRNAseq_ref.Rds")

# Create the RCTD reference object
Idents(ref) <- "subclass_label"
counts <- ref[["RNA"]]$counts
cluster <- as.factor(ref$subclass_label)
nUMI <- ref$nCount_RNA
levels(cluster) <- gsub("/", "-", levels(cluster))
cluster <- droplevels(cluster)
reference <- Reference(counts, cluster, nUMI)

# Create the RCTD reference object
reference <- Reference(counts, cluster, nUMI)
counts_hd <- object_subset[["sketch"]]$counts
cortex_cells_hd <- colnames(object_subset[["sketch"]])
coords <- GetTissueCoordinates(object_subset)[cortex_cells_hd, 1:2]
query <- SpatialRNA(coords, counts_hd, colSums(counts_hd))

#Run RCTD analysis
RCTD <- create.RCTD(query, reference, max_cores = 28)
RCTD <- run.RCTD(RCTD, doublet_mode = "doublet")

#Add the cell type annotations to the spatial dataset
object_subset <- AddMetaData(object_subset, metadata = RCTD@results$results_df)
object_subset$first_type <- as.character(object_subset$first_type)
object_subset$first_type[is.na(object_subset$first_type)] <- "Unknown"

#Visualizing cell identities in the spatial locations e.g. Layered pallium (starts with L), excitatory neurons in the cortex
Idents(object_subset) <- "first_type"
cells <- CellsByIdentities(object_subset)
excitatory_names <- sort(grep("^L.* CTX", names(cells), value = TRUE))
p <- SpatialDimPlot(object_subset, cells.highlight = cells[excitatory_names], cols.highlight = c("#FFFF00", "grey50"), facet.highlight = T, combine = T, ncol = 4)
p

RCTD_Mousebrainlabels

Identifying spatially defined tissue domains

In spatial data, we can plot biologically relevant spatial neighbourhoods using the mean and the gradient of gene expression levels in a spot’s broader neighborhood. This methodology utilizes Building Aggregates with a Neighborhood Kernel and Spatial Yardstick (BANKSY) [Singhal, V., Chou, N., Lee, J. et al. BANKSY unifies cell typing and tissue domain segmentation for scalable spatial omics data analysis. Nat Genet 56, 431–441 (2024). https://doi.org/10.1038/s41588-024-01664-3]

#Running BANKSY to produce new clusterings based on spot neighborhoods
object_subset <- RunBanksy(object_subset,
lambda = 0.8, verbose = TRUE,
assay = "sketch", slot = "data", features = "variable",
k_geom = 50
) #You can try different parameters of lambda and k_geom
object_subset <- RunPCA(object_subset, assay = "BANKSY", reduction.name = "pca.banksy", features = rownames(object_subset), npcs = 30)
object_subset <- FindNeighbors(object_subset, reduction = "pca.banksy", dims = 1:30)
object_subset <- FindClusters(object_subset, cluster.name = "banksy_cluster", resolution = 0.5) #You can try different parameters of resolution

#Plotting the BANKSY defined neighborhoods
Idents(object_subset) <- "banksy_cluster"
p <- SpatialDimPlot(object_subset, group.by = "banksy_cluster", label = T, repel = T, label.size = 4)
p

banksy_cells <- CellsByIdentities(object_subset)
p <- SpatialDimPlot(object_subset, cells.highlight = banksy_cells[setdiff(names(banksy_cells), "NA")], cols.highlight = c("#FFFF00", "grey50"), facet.highlight = T, combine = T) + NoLegend()
p

BANKSY_Spatialneighborhoods

Banksy_Spatialregions

SpatialPCA is a spatially aware dimension reduction method that aims to infer a low dimensional representation of the gene expression data in spatial transcriptomics. SpatialPCA builds upon the probabilistic version of PCA, incorporates localization information as additional input, and uses a kernel matrix to explicitly model the spatial correlation structure across tissue locations.[Shang, L., Zhou, X. Spatially aware dimension reduction for spatial transcriptomics. Nat Commun 13, 7203 (2022). https://doi.org/10.1038/s41467-022-34879-1]

#Converting Seurat to SpatialPCA object
Location<-as.matrix.data.frame(object_subset@images$slice1@coordinates)
Locationsubset<-Location[,1:2]
Locationsubset<-as.data.frame(Locationsubset)
Locationsubset$x<-as.numeric(Locationsubset$x)
Locationsubset$y<-as.numeric(Locationsubset$y)
Locationsubset<-as.matrix.data.frame(Locationsubset)
slideseqv2 = CreateSpatialPCAObject(counts=object_subset@assays$RNA$counts, location=Locationsubset, project = "SpatialPCA",gene.type="spatial",sparkversion="sparkx",numCores_spark=10, gene.number=3000, customGenelist=NULL,min.loctions = 20, min.features=20)

#Estimate Spatial PCs
slideseqv2 = SpatialPCA_buildKernel(slideseqv2, kerneltype="gaussian", bandwidthtype="Silverman",bandwidth.set.by.user=NULL,sparseKernel=TRUE,sparseKernel_tol=1e-20,sparseKernel_ncore=10)
slideseqv2 = SpatialPCA_EstimateLoading(slideseqv2,fast=TRUE,SpatialPCnum=20)
slideseqv2 = SpatialPCA_SpatialPCs(slideseqv2, fast=TRUE)

#Detect Spatial Domains and visualize
clusterlabel= louvain_clustering(clusternum=14,latent_dat=slideseqv2@SpatialPCs,knearest=310)
plot_cluster(location=object_subset@location,clusterlabel,pointsize=0.5,title_in=paste0("SpatialPCA"),color_in=cbp,legend="bottom")

SpatialPCA_regions

###Reference - https://satijalab.org/seurat/articles/visiumhd_analysis_vignette https://lulushang.org/SpatialPCA_Tutorial/index.html

⚠️ **GitHub.com Fallback** ⚠️