Single Cell Omics‐‐theory - igheyas/Bioinformatics GitHub Wiki

What is “omics”

"Omics" means studying everything in a system—like all the genes, or all the proteins, or all the molecules in a cell.

Here are a few types:

Genomics = study of all the genes (DNA)

Transcriptomics = study of all the RNA made from genes

Proteomics = study of all the proteins

Metabolomics = study of all the small chemicals in cells

So “omics” means looking at the big picture, not just one thing.

What is single-cell omics?

Most omics studies look at many cells together, like a tissue or blood sample. But...

👉 Single-cell omics looks at one cell at a time!

That’s super powerful because:

Different cells do different jobs.

Even in the same tissue, no two cells are exactly alike.

Diseases like cancer can start in just one weird cell.

🔍 Instead of mixing thousands of cells and seeing an average, we now look at each cell to learn more.

🎯 Main Goals / Outputs:

Cell Types

What different kinds of cells are in a sample?
Which cells are similar or different?

Gene Expression Patterns

Which genes are turned on/off in each cell?
What does that say about the cell’s job?

Cell States or Activities

Is a cell dividing? Resting? Dying?
What’s happening in the cell right now?

Rare Cells or Subgroups

Can we find tiny groups that might be important—like early cancer cells or stem cells?

Changes Over Time or Condition

How do cells change in disease, or during development?

So, the output is usually a map or groupings of cells, showing:

What kind of cell it is
What it’s doing
How it compares to others

So yes! Each type focuses on a different layer of the cell.

Want to try naming which layer you think gives the best clue about a cell’s current job?

What do you think the point is of using just the variable genes, instead of all 100?

library(Seurat)
set.seed(123)

# ---- parameters ----
n_genes <- 100
cells_per_cluster <- 25
clusters <- rep(c("A","B"), each = cells_per_cluster)
n_cells <- length(clusters)

# ---- simulate counts ----
# baseline expression ~ Poisson(5)
counts <- matrix(
  rpois(n_genes * n_cells, lambda = 5),
  nrow = n_genes, ncol = n_cells
)
# bump up expression of genes 1:10 in cluster B
counts[1:10, clusters=="B"] <- 
  rpois(sum(clusters=="B") * 10, lambda = 20)

# give the genes & cells names
rownames(counts) <- paste0("Gene", sprintf("%03d", 1:n_genes))
colnames(counts) <- paste0("Cell", sprintf("%02d", 1:n_cells))

# metadata
meta <- data.frame(
  row.names = colnames(counts),
  cluster = clusters
)

# Inspect
head(counts)[,1:6]
#>       Cell01 Cell02 Cell03 Cell04 Cell05 Cell06
#> Gene001      4      8      6      7      3      6
#> Gene002      4      5      7      6      2      5
#> Gene003      5      7      7      6      6      4
#> Gene004      3      6      5      5      4      8
#> Gene005      8      7      1      5      6      7
#> Gene006     10      3      9      2     11      6

head(meta)
#>         cluster
#> Cell01       A
#> Cell02       A
#> Cell03       A
#> Cell04       A
#> Cell05       A
#> Cell06       A

library(Seurat)

set.seed(123)

➡ Makes random numbers reproducible — so every time you run it, you get the same data.

🧮 Parameters: how many genes and cells?

n_genes <- 100
cells_per_cluster <- 25

You’ll make 100 genes, and 50 cells total — split into 2 groups of 25 each.

clusters <- rep(c("A","B"), each = cells_per_cluster)

Makes a vector like: A, A, A, ..., B, B, B — 25 A's, 25 B's. Each letter is a "cluster label" for a cell.

n_cells <- length(clusters)

Now n_cells is 50.

counts <- matrix(
  rpois(n_genes * n_cells, lambda = 5),
  nrow = n_genes, ncol = n_cells
)

counts[1:10, clusters=="B"] <- 
  rpois(sum(clusters=="B") * 10, lambda = 20)

rownames(counts) <- paste0("Gene", sprintf("%03d", 1:n_genes))
colnames(counts) <- paste0("Cell", sprintf("%02d", 1:n_cells))

meta <- data.frame(
  row.names = colnames(counts),
  cluster = clusters
)

head(counts)[,1:6]

head(meta)

counts <- matrix(
  rpois(n_genes * n_cells, lambda = 5),
  nrow = n_genes, ncol = n_cells
)


# 3.1 Create Seurat object
sc <- CreateSeuratObject(
  counts = counts,
  meta.data = meta,
  project = "ToySC"
)

# 3.2 Quality control (skip mitochondrial for toy data)
sc <- subset(sc, subset = nFeature_RNA > 5)

# 3.3 Normalize
sc <- NormalizeData(sc, normalization.method = "LogNormalize", scale.factor = 1e4)
# 3.4 Find highly variable features

sc <- FindVariableFeatures(
  object = sc,
  selection.method = "vst",
  nfeatures = 20
)

# 3.5 Scale (regress out no covariates here)
sc <- ScaleData(
  object   = sc,
  features = VariableFeatures(sc)
)

#3.6. Run PCA on just those genes, asking for e.g. 10 components,
# and use the full (lapack) SVD if you like (approx = FALSE).
sc <- RunPCA(
  object     = sc,
  features   = VariableFeatures(sc),
  npcs       = 10,
  approx     = FALSE,     # forces a standard SVD instead of irlba
  verbose    = FALSE      # suppresses progress messages
)

# Check your top PCs
ElbowPlot(sc, ndims = 10)

# 3.7 Neighbors & clustering
sc <- FindNeighbors(sc, dims = 1:10)
sc <- FindClusters(sc, resolution = 0.5)


# 3.8 UMAP embedding
sc <- RunUMAP(sc, dims = 1:10)
DimPlot(sc, label = TRUE)

Build a Seurat object & run the basic pipeline

sc <- CreateSeuratObject(
  counts = counts,
  meta.data = meta,
  project = "ToySC"
)

sc <- subset(sc, subset = nFeature_RNA > 5)

sc <- NormalizeData(sc, normalization.method = "LogNormalize", scale.factor = 1e4)

log1p( (gene count / total counts per cell) × 10,000 )

range(sc["RNA"](/igheyas/Bioinformatics/wiki/"RNA")@data)

sc <- FindVariableFeatures(
  object = sc,
  selection.method = "vst",
  nfeatures = 20
)

sc <- ScaleData(
  object   = sc,
  features = VariableFeatures(sc)
)

sc <- RunPCA(
  object     = sc,
  features   = VariableFeatures(sc),
  npcs       = 10,
  approx     = FALSE,
  verbose    = FALSE
)

ElbowPlot(sc, ndims = 10)

sc <- FindNeighbors(sc, dims = 1:10)
sc <- FindClusters(sc, resolution = 0.5)

sc <- RunUMAP(sc, dims = 1:10)
DimPlot(sc, label = TRUE)

Idents(sc) <- sc$cluster

table(Idents(sc))
# A  B 
# 25 25

de_A_vs_B <- FindMarkers(
  object        = sc,
  ident.1       = "A",         # first group
  ident.2       = "B",         # second group
  min.pct       = 0.1,         # only test genes detected in ≥10% of cells
  logfc.threshold = 0.25       # only report genes with log2FC ≥ 0.25
)

# Inspect the top hits
head(de_A_vs_B, n = 10)
#            p_val avg_log2FC  pct.1 pct.2 p_val_adj
# Gene003 1.2e-08      1.95   1.00  0.04 1.0e-07
# Gene008 3.5e-07      1.80   1.00  0.08 2.9e-06
# …etc…

de_A_vs_B <- FindMarkers(
  object = sc,

  ident.1 = "A",         # first group
  ident.2 = "B",         # second group

min.pct = 0.1,         # only test genes detected in ≥10% of cells

   logfc.threshold = 0.25 # only report genes with log2FC ≥ 0.25
)

head(de_A_vs_B, 10)

Output:

top5 <- rownames(de_A_vs_B)[1:5]

# Violin plots
VlnPlot(sc, features = top5, group.by = "cluster", pt.size = 0.5)

# Feature (UMAP) plots
FeaturePlot(sc, features = top5)

# Heatmap
DoHeatmap(sc, features = top5, group.by = "cluster") + NoLegend()

Violin Plots

#UMap plots

# 1) Scale just the DE genes you want to plot
sc <- ScaleData(
  object   = sc,
  features = top5
)

# 2) Now they live in sc["RNA"](/igheyas/Bioinformatics/wiki/"RNA")@scale.data, so DoHeatmap will find them
DoHeatmap(
  object   = sc,
  features = top5,
  group.by = "cluster"
) + NoLegend()

DoHeatmap(
  object   = sc,
  features = top5,
  group.by = "cluster",
  slot     = "data",      # pull from NormalizeData() output
  assay    = "RNA"
) + NoLegend()

BiocManager::install("slingshot")
library(slingshot)
library(SingleCellExperiment)

####If you quit R using q() and then start R again, the previous data in the R console appears to be lost.

B.2 Convert Seurat → SingleCellExperiment

library(Seurat)

# Re‐simulate your toy data
set.seed(123)
n_genes <- 100; cells_per_cluster <- 25
clusters <- rep(c("A","B"), each = cells_per_cluster)
counts <- matrix(rpois(n_genes * length(clusters), 5), nrow=n_genes)
counts[1:10, clusters=="B"] <- rpois(sum(clusters=="B")*10, 20)
rownames(counts) <- paste0("Gene", sprintf("%03d",1:n_genes))
colnames(counts) <- paste0("Cell", sprintf("%02d",1:length(clusters)))
meta <- data.frame(cluster = clusters, row.names = colnames(counts))

# Re‐create Seurat object
sc <- CreateSeuratObject(counts = counts, meta.data = meta, project = "ToySC")
sc <- subset(sc, subset = nFeature_RNA > 5)
sc <- NormalizeData(sc)
sc <- FindVariableFeatures(sc, nfeatures = 20)
sc <- ScaleData(sc, features = VariableFeatures(sc))
sc <- RunPCA(sc, features = VariableFeatures(sc), npcs = 10, approx = FALSE)
sc <- FindNeighbors(sc, dims = 1:10)
sc <- FindClusters(sc, resolution = 0.5)
sc <- RunUMAP(sc, dims = 1:10)

Trajectory and Pseudotime Analysis

# ─────────────────────────────────────────────────────────────────────────────
# B.1 Install & load
# (only need to install once; skip if already done)
if (!requireNamespace("slingshot", quietly = TRUE)) {
  if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
  BiocManager::install("slingshot")
}

# load packages
library(Seurat)               # for as.SingleCellExperiment()
library(SingleCellExperiment) # for the SCE container
library(slingshot)            # core trajectory function
library(scater)               # for plotUMAP()
library(ggplot2)              # for custom ggplot layers

# ─────────────────────────────────────────────────────────────────────────────
# B.2 Convert Seurat → SingleCellExperiment
sce <- as.SingleCellExperiment(sc)

# copy over UMAP coords and original clusters from `sc`
reducedDims(sce)$UMAP    <- Embeddings(sc, "umap")
colData(sce)$cluster     <- sc$cluster

# quick sanity-checks:
stopifnot("UMAP" %in% names(reducedDims(sce)))
stopifnot("cluster" %in% colnames(colData(sce)))

# ─────────────────────────────────────────────────────────────────────────────
# B.3 Run Slingshot
sce <- slingshot(
  sce,
  clusterLabels = "cluster",
  reducedDim     = "UMAP",
  start.clus     = "A",
  end.clus       = "B"
)

# ─────────────────────────────────────────────────────────────────────────────

# ─────────────────────────────────────────────────────────────────────────────
# You’re all set!

library(Seurat)               # for as.SingleCellExperiment()
library(SingleCellExperiment) # to use the SCE format
library(slingshot)            # Slingshot = trajectory inference tool
library(scater)               # for plotting (like plotUMAP)
library(ggplot2)              # for customizing plots

sce <- as.SingleCellExperiment(sc)

sce <- as.SingleCellExperiment(sc)

reducedDims(sce)$UMAP <- Embeddings(sc, "umap")

reducedDims(sce)$UMAP <- Embeddings(sc, "umap")

reducedDims(sce)$UMAP

colData(sce)$cluster <- sc$cluster

library(slingshot)

# pull out your UMAP embedding
umap <- reducedDims(sce)$UMAP

# make a factor of your clusters
clust_fac <- factor(colData(sce)$cluster, levels = c("A","B"))

# now assign colours, e.g. red for A, blue for B
cols <- c("A"="red","B"="blue")[ as.character(clust_fac) ]

# plot the points
plot(umap,
     col   = cols,
     pch   = 16,
     asp   = 1,
     xlab  = "UMAP1",
     ylab  = "UMAP2",
     main  = "Slingshot Trajectory, clusters")

# overlay the smooth Slingshot curves in black
lines(sce, lwd = 2, col = "black")

library(slingshot)

Loads the Slingshot package, which you already used to compute trajectories.

umap <- reducedDims(sce)$UMAP

clust_fac <- factor(colData(sce)$cluster, levels = c("A", "B"))

cols <- c("A"="red", "B"="blue")[as.character(clust_fac)]

cols <- c("A" = "red", "B" = "blue")[c("A", "A", ..., "B")]

plot(umap,
     col   = cols,
     pch   = 16,
     asp   = 1,
     xlab  = "UMAP1",
     ylab  = "UMAP2",
     main  = "Slingshot Trajectory, clusters")

lines(sce, lwd = 2, col = "black")

Plot Pseudotime

# 1) Run Slingshot (you’ve done this)
sce <- slingshot(
  sce,
  clusterLabels = "cluster",
  reducedDim     = "UMAP",
  start.clus     = "A",
  end.clus       = "B"
)

# 2) Extract UMAP coords and pseudotime
umap <- reducedDims(sce)$UMAP        # this creates the 'umap' matrix
pt   <- slingPseudotime(sce)[,1]     # your pseudotime vector

# 3) Plot cells colored by pseudotime
library(viridis)                     # for a nice continuous palette
cols <- viridis(100)[ cut(pt, breaks=100) ]

plot(umap,
     col   = cols,
     pch   = 16,
     asp   = 1,
     xlab  = "UMAP1",
     ylab  = "UMAP2",
     main  = "Slingshot Trajectory (pseudotime)")

# 4) Overlay the inferred lineage curve
lines(sce, lwd = 2, col = "black")

umap <- reducedDims(sce)$UMAP
pt   <- slingPseudotime(sce)[,1]

library(viridis)
cols <- viridis(100)[ cut(pt, breaks = 100) ]

viridis(100)[cut(pt, breaks = 100)]

plot(umap,
     col   = cols,       # pseudotime-based color
     pch   = 16,         # solid dots
     asp   = 1,          # equal axes
     xlab  = "UMAP1",
     ylab  = "UMAP2",
     main  = "Slingshot Trajectory (pseudotime)")

lines(sce, lwd = 2, col = "black")  # lineage curve