Simulation and Benchmarking (Optional) ‐ Splatter (R) ‐ dyngen (Python) - iffatAGheyas/bioinformatics-tutorial-wiki GitHub Wiki

6.1.10 Simulation & Benchmarking (Optional)

To benchmark your single-cell workflows or test new methods, it’s often useful to work with synthetic datasets where the “ground truth” is known. Two popular simulators are Splatter in R and dyngen in Python.


A. Splatter (R)

Installation

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("splatter")

Simulate a Simple Grouped Experiment

library(splatter)
set.seed(42)

# 1) Create default Splat parameters
params <- newSplatParams(
  batchCells = 1000,    # total cells
  nGenes     = 5000     # number of genes
)

# 2) Simulate two groups with 10% DE genes
sim <- splatSimulate(
  params,
  method     = "groups",
  group.prob = c(0.5, 0.5),
  de.prob    = 0.1,
  verbose    = FALSE
)

# 3) Inspect the SingleCellExperiment
sim
# - Assays:  counts, logcounts
# - colData(sim)$Group: “Group1” or “Group2”
# - rowData(sim)$DEFacGroup2: fold‐changes

# 4) Access counts matrix
counts_mat <- assay(sim, "counts")
logcounts_mat <- assay(sim, "logcounts")

# 5) Plot PCA colored by true group
library(scater)
sce <- sim
sce <- runPCA(sce)
plotPCA(sce, colour_by="Group")

  • Outputs:
    • A SingleCellExperiment object with known group labels and DE fold-changes.
    • “counts” and “logcounts” assays for downstream benchmarking.

Advanced Splatter Features

  • Batches: simulate multiple batches via batchCells and batch.facLoc / batch.facScale.
  • Trajectories: method = "paths" or "linear" for continuous differentiation.
  • Library Size Effects, Dropout, Gene Mean–Variance tuning via SplatParams.

See the Splatter vignette for full options.