Week 4: Normalization - bcb420-2025/Izumi_Ando GitHub Wiki
A scaling normalization method for differential expression analysis of RNA-seq data
this paper introduces the TMM normalization method to adjust for sample composition differences, improving the accuracy of differential expression analysis
Citation
Robinson, M. D., & Oshlack, A. (2010). A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biology, 11, R25. https://genomebiology.biomedcentral.com/articles/10.1186/gb-2010-11-3-r25
Notes
normalization challenges
- simple total count scaling can be misleading when samples have different rna compositions
- highly expressed genes in one sample reduce the “sequencing real estate” for other genes, biasing comparisons
the tmm method
- introduces trimmed mean of M-values (TMM) to calculate scaling factors
- trims extreme log-fold changes and absolute expression values to estimate a robust normalization factor
- applied to a liver vs kidney dataset, it corrects biased log-fold changes and aligns housekeeping gene expression
significance and impact
- reduces false positives in differential expression by accounting for composition bias
- laid a foundation for subsequent rnaseq normalization methods, now widely adopted
Selecting between-sample RNA-Seq normalization methods from the perspective of their assumptions
this paper reviews various normalization methods by focusing on their underlying assumptions, highlighting how assumption violations can skew dge results
Citation
Evans, C., Hardin, J., & Stoebel, D. M. (2017). Selecting between-sample RNA-Seq normalization methods from the perspective of their assumptions. Briefings in Bioinformatics, 19(5), 776–792. https://doi.org/10.1093/bib/bbx008
Notes
focus on methodological assumptions
- compares common normalization strategies (total count, RPKM, TMM, DESeq normalization, etc.)
- key assumptions include constant total rna output across samples and balanced differential expression (symmetric up/down regulation)
evaluation of methods
- uses both theoretical discussion and analysis of simulated/real data
- shows that methods can perform poorly when assumptions (eg, symmetric expression) are violated
- highlights issues like global shifts in expression that can mislead standard normalization approaches
practical recommendations
- recommends using diagnostic plots (eg, ma plots) to assess normalization performance
- stresses the importance of matching the normalization method to the experimental design and biological context
- advises caution when interpreting dge results if key assumptions are not met