Trajectory and Pseudotime Analysis - iffatAGheyas/bioinformatics-tutorial-wiki GitHub Wiki

6.1.7 Trajectory & Pseudotime Analysis

Cells often exist along continuous processes (e.g. differentiation). Trajectory inference orders cells in “pseudotime” and—as an extension—RNA velocity predicts future transcriptional states. We cover three popular frameworks:


A. Monocle 3 (R)

Installation

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("monocle3")

Basic Workflow

library(monocle3)
# 1) Convert Seurat or SCE to cell_data_set
cds <- as.cell_data_set(seurat_obj)  

# 2) Preprocess & reduce
cds <- preprocess_cds(cds, num_dim=50)
cds <- reduce_dimension(cds, reduction_method="UMAP")

# 3) Cluster & learn graph
cds <- cluster_cells(cds, reduction_method="UMAP")
cds <- learn_graph(cds)

# 4) Order cells in pseudotime
cds <- order_cells(cds, root_cells = choose_cells(cds, "start_cluster"))

# 5) Visualize trajectory
plot_cells(
  cds,
  color_cells_by = "pseudotime",
  label_groups_by_cluster=FALSE,
  show_trajectory_graph=TRUE
)

Output:

  • A UMAP coloured by pseudotime (0 → max)
  • Trajectory graph connecting cell states

B. Slingshot (R)

Installation
BiocManager::install("slingshot")

Basic Workflow

library(SingleCellExperiment)
library(slingshot)

# Assume `sce` is a SingleCellExperiment with PCA already computed
sce <- slingshot(
  sce,
  clusterLabels = 'seurat_clusters',
  reducedDim = 'PCA'
)

# Retrieve pseudotime for each lineage
pt <- slingPseudotime(sce)

# Plot in UMAP space
library(scater)
plotUMAP(sce, colour_by = pt[,1], size=1)  # lineage 1

Output:

  • Matrix of pseudotime values (slingPseudotime).
  • Lineages overlaid on embedding.

C. scVelo (Python)

Installation

conda install -c bioconda scvelo

Basic Workflow

import scvelo as scv
import scanpy as sc

# 1) Load AnnData (with spliced/unspliced layers)
adata = scv.read("data/my_loom.loom", cache=True)

# 2) Preprocess
scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=2000)
scv.pp.moments(adata, n_pcs=30, n_neighbors=30)

# 3) Estimate velocities
scv.tl.velocity(adata, mode='stochastic')
scv.tl.velocity_graph(adata)

# 4) Embedding & velocity stream
scv.tl.umap(adata)
scv.pl.velocity_embedding_grid(adata, basis='umap', arrow_length=3, arrow_size=1)

# 5) Compute pseudotime
scv.tl.recover_latent_time(adata)
scv.pl.scatter(adata, color='latent_time', cmap='gnuplot')

Output:

  • Velocity streamlines on UMAP (arrows pointing toward future states).

  • adata.obs['latent_time']: per‐cell pseudotime estimate.

Tips & Best Practices

  • Choose your starting/root cell or cluster carefully (e.g. known progenitor markers).
  • Compare multiple methods (Monocle vs. Slingshot vs. Velocity) for consistency.
  • Visualize gene expression dynamics along pseudotime (e.g. heatmaps of top dynamic genes).
  • Validate trajectories with known lineage markers or experimental time courses.