Week 4: Normalization - bcb420-2025/Izumi_Ando GitHub Wiki

A scaling normalization method for differential expression analysis of RNA-seq data

this paper introduces the TMM normalization method to adjust for sample composition differences, improving the accuracy of differential expression analysis

Citation

Robinson, M. D., & Oshlack, A. (2010). A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biology, 11, R25. https://genomebiology.biomedcentral.com/articles/10.1186/gb-2010-11-3-r25

Notes

normalization challenges

simple total count scaling can be misleading when samples have different rna compositions
highly expressed genes in one sample reduce the “sequencing real estate” for other genes, biasing comparisons

the tmm method

introduces trimmed mean of M-values (TMM) to calculate scaling factors
trims extreme log-fold changes and absolute expression values to estimate a robust normalization factor
applied to a liver vs kidney dataset, it corrects biased log-fold changes and aligns housekeeping gene expression

significance and impact

reduces false positives in differential expression by accounting for composition bias
laid a foundation for subsequent rnaseq normalization methods, now widely adopted

Selecting between-sample RNA-Seq normalization methods from the perspective of their assumptions

this paper reviews various normalization methods by focusing on their underlying assumptions, highlighting how assumption violations can skew dge results

Citation

Evans, C., Hardin, J., & Stoebel, D. M. (2017). Selecting between-sample RNA-Seq normalization methods from the perspective of their assumptions. Briefings in Bioinformatics, 19(5), 776–792. https://doi.org/10.1093/bib/bbx008

Notes

focus on methodological assumptions

compares common normalization strategies (total count, RPKM, TMM, DESeq normalization, etc.)
key assumptions include constant total rna output across samples and balanced differential expression (symmetric up/down regulation)

evaluation of methods

uses both theoretical discussion and analysis of simulated/real data
shows that methods can perform poorly when assumptions (eg, symmetric expression) are violated
highlights issues like global shifts in expression that can mislead standard normalization approaches

practical recommendations

recommends using diagnostic plots (eg, ma plots) to assess normalization performance
stresses the importance of matching the normalization method to the experimental design and biological context
advises caution when interpreting dge results if key assumptions are not met