Simulation and Benchmarking (Optional) ‐ Splatter (R) ‐ dyngen (Python) - iffatAGheyas/bioinformatics-tutorial-wiki GitHub Wiki
6.1.10 Simulation & Benchmarking (Optional)
To benchmark your single-cell workflows or test new methods, it’s often useful to work with synthetic datasets where the “ground truth” is known. Two popular simulators are Splatter in R and dyngen in Python.
A. Splatter (R)
Installation
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install("splatter")
Simulate a Simple Grouped Experiment
library(splatter)
set.seed(42)
# 1) Create default Splat parameters
params <- newSplatParams(
batchCells = 1000, # total cells
nGenes = 5000 # number of genes
)
# 2) Simulate two groups with 10% DE genes
sim <- splatSimulate(
params,
method = "groups",
group.prob = c(0.5, 0.5),
de.prob = 0.1,
verbose = FALSE
)
# 3) Inspect the SingleCellExperiment
sim
# - Assays: counts, logcounts
# - colData(sim)$Group: “Group1” or “Group2”
# - rowData(sim)$DEFacGroup2: fold‐changes
# 4) Access counts matrix
counts_mat <- assay(sim, "counts")
logcounts_mat <- assay(sim, "logcounts")
# 5) Plot PCA colored by true group
library(scater)
sce <- sim
sce <- runPCA(sce)
plotPCA(sce, colour_by="Group")
- Outputs:
- A
SingleCellExperiment
object with known group labels and DE fold-changes. - “counts” and “logcounts” assays for downstream benchmarking.
- A
Advanced Splatter Features
- Batches: simulate multiple batches via
batchCells
andbatch.facLoc
/batch.facScale
. - Trajectories:
method = "paths"
or"linear"
for continuous differentiation. - Library Size Effects, Dropout, Gene Mean–Variance tuning via
SplatParams
.
See the Splatter vignette for full options.