Week 5: DE Protocol - bcb420-2025/Izumi_Ando GitHub Wiki

Count-based differential expression analysis of RNA sequencing data using R and Bioconductor

a detailed step-by-step protocol for running differential gene expression analysis using edgeR and DESeq, aimed at helping users choose between them and apply them properly

Citation

Anders, S., McCarthy, D. J., Chen, Y., Okoniewski, M., Smyth, G. K., Huber, W., & Robinson, M. D. (2013). Differential expression analysis for sequence count data. Nature Protocols, 8(9), 1765–1786. https://doi.org/10.1038/nprot.2013.099

Notes

intro + goals

protocol builds on earlier edgeR and DESeq papers but walks through full workflow for practical use
meant for biologists with limited stats experience who want to do RNA-seq DE analysis in R
both tools model count data with negative binomial distribution, but differ in how they estimate dispersion and shrinkage

normalization + filtering

shows how to import count data into R and filter out low-count genes
both methods normalize for library size:
- edgeR uses TMM
- DESeq uses a median-of-ratios method
emphasizes that proper normalization is critical before any modeling

dispersion estimation

dispersion reflects biological variability and is estimated differently by the two tools
- edgeR uses empirical Bayes to shrink dispersions toward a common trend
- DESeq fits a dispersion-mean relationship and smooths individual estimates
choice of method can impact sensitivity and FDR depending on sample size

differential testing

edgeR supports exact tests and GLM-based designs (good for complex experiments)
DESeq uses a test based on the NB distribution with optional shrinkage of fold changes
both packages control for multiple testing using Benjamini-Hochberg FDR

practical guidance

includes actual R code examples for each step (loading data, fitting models, extracting DE genes)
recommends visual checks (MA plots, p-value histograms) before trusting results
suggests exporting results for downstream analysis like GO/pathway enrichment

comments

paper's strength is in clarity—walks through a complete real-world pipeline
useful if you’re trying to decide between edgeR and DESeq or want a reproducible script-based workflow
also highlights some pitfalls like poor normalization or ignoring low-count filtering