Trajectory and Pseudotime Analysis - iffatAGheyas/bioinformatics-tutorial-wiki GitHub Wiki
6.1.7 Trajectory & Pseudotime Analysis
Cells often exist along continuous processes (e.g. differentiation). Trajectory inference orders cells in “pseudotime” and—as an extension—RNA velocity predicts future transcriptional states. We cover three popular frameworks:
A. Monocle 3 (R)
Installation
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install("monocle3")
Basic Workflow
library(monocle3)
# 1) Convert Seurat or SCE to cell_data_set
cds <- as.cell_data_set(seurat_obj)
# 2) Preprocess & reduce
cds <- preprocess_cds(cds, num_dim=50)
cds <- reduce_dimension(cds, reduction_method="UMAP")
# 3) Cluster & learn graph
cds <- cluster_cells(cds, reduction_method="UMAP")
cds <- learn_graph(cds)
# 4) Order cells in pseudotime
cds <- order_cells(cds, root_cells = choose_cells(cds, "start_cluster"))
# 5) Visualize trajectory
plot_cells(
cds,
color_cells_by = "pseudotime",
label_groups_by_cluster=FALSE,
show_trajectory_graph=TRUE
)
Output:
- A UMAP coloured by pseudotime (0 → max)
- Trajectory graph connecting cell states
B. Slingshot (R)
Installation
BiocManager::install("slingshot")
Basic Workflow
library(SingleCellExperiment)
library(slingshot)
# Assume `sce` is a SingleCellExperiment with PCA already computed
sce <- slingshot(
sce,
clusterLabels = 'seurat_clusters',
reducedDim = 'PCA'
)
# Retrieve pseudotime for each lineage
pt <- slingPseudotime(sce)
# Plot in UMAP space
library(scater)
plotUMAP(sce, colour_by = pt[,1], size=1) # lineage 1
Output:
- Matrix of pseudotime values (
slingPseudotime
). - Lineages overlaid on embedding.
C. scVelo (Python)
Installation
conda install -c bioconda scvelo
Basic Workflow
import scvelo as scv
import scanpy as sc
# 1) Load AnnData (with spliced/unspliced layers)
adata = scv.read("data/my_loom.loom", cache=True)
# 2) Preprocess
scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=2000)
scv.pp.moments(adata, n_pcs=30, n_neighbors=30)
# 3) Estimate velocities
scv.tl.velocity(adata, mode='stochastic')
scv.tl.velocity_graph(adata)
# 4) Embedding & velocity stream
scv.tl.umap(adata)
scv.pl.velocity_embedding_grid(adata, basis='umap', arrow_length=3, arrow_size=1)
# 5) Compute pseudotime
scv.tl.recover_latent_time(adata)
scv.pl.scatter(adata, color='latent_time', cmap='gnuplot')
Output:
-
Velocity streamlines on UMAP (arrows pointing toward future states).
-
adata.obs['latent_time']: per‐cell pseudotime estimate.
Tips & Best Practices
- Choose your starting/root cell or cluster carefully (e.g. known progenitor markers).
- Compare multiple methods (Monocle vs. Slingshot vs. Velocity) for consistency.
- Visualize gene expression dynamics along pseudotime (e.g. heatmaps of top dynamic genes).
- Validate trajectories with known lineage markers or experimental time courses.