Diversity and Statistical Analysis - iffatAGheyas/bioinformatics-tutorial-wiki GitHub Wiki
6.2.8 Diversity & Statistical Analysis
Once you have taxonomic profiles or ASV/OTU count tables, the next step is to quantify diversity within and between samples, and to identify taxa that differ between conditions.
A. Alpha Diversity (Within‐Sample)
Metrics
- Shannon index: accounts for both richness and evenness
- Simpson index: emphasizes dominance (less sensitive to rare taxa)
QIIME 2
# Assuming you have an exported feature-table.biom and rooted phylogeny (for Faith’s PD):
qiime diversity alpha \
--i-table feature-table.qza \
--p-metric shannon \
--o-alpha-diversity shannon_vector.qza
qiime diversity alpha \
--i-table feature-table.qza \
--p-metric simpson \
--o-alpha-diversity simpson_vector.qza
# View results
qiime metadata tabulate \
--m-input-file shannon_vector.qza \
--o-visualization shannon_vector.qzv
phyloseq (R)
library(phyloseq)
# import your OTU/ASV table + taxonomy + sample metadata
ps <- import_biom("feature-table.biom")
ps <- merge_phyloseq(ps, sample_data(read.csv("metadata.csv", row.names=1)))
# Compute and plot
plot_richness(
ps,
measures = c("Shannon","Simpson"),
x = "Condition"
) + theme_bw()
B. Beta Diversity (Between‐Sample)
Distance metrics
-
Bray–Curtis: abundance‐based
-
UniFrac (unweighted/weighted): phylogeny‐aware
Ordination
-
PCoA (Principal Coordinates Analysis)
-
NMDS (Non‐metric multidimensional scaling)
QIIME 2
# First compute core metrics (including multiple distances + ordinations)
qiime diversity core-metrics-phylogenetic \
--i-table feature-table.qza \
--i-phylogeny rooted-tree.qza \
--p-sampling-depth 10000 \
--m-metadata-file sample-metadata.tsv \
--output-dir core-metrics-results
# Visualize PCoA
qiime diversity pcoa-visualization \
--i-pcoa core-metrics-results/bray_curtis_pcoa_results.qza \
--m-metadata-file sample-metadata.tsv \
--o-visualization core-metrics-results/bray_curtis_pcoa.qzv
phyloseq (R)
library(phyloseq)
# Assuming `ps` is your phyloseq object with a phylogenetic tree
ps <- prune_samples(sample_sums(ps) > 0, ps)
# Calculate distances
bc_dist <- distance(ps, method="bray")
wu_dist <- UniFrac(ps, weighted=FALSE)
wwu_dist <- UniFrac(ps, weighted=TRUE)
# PCoA
ordu_bc <- ordinate(ps, method="PCoA", distance=bc_dist)
plot_ordination(ps, ordu_bc, color="Condition") + ggtitle("Bray–Curtis PCoA")
# NMDS
ordu_nmds <- ordinate(ps, method="NMDS", distance=bc_dist)
plot_ordination(ps, ordu_nmds, color="Condition") + ggtitle("NMDS (Bray–Curtis)")
C. Differential Abundance
1. ANCOM (QIIME 2 plugin)
# Convert to compositional data
qiime composition add-pseudocount \
--i-table feature-table.qza \
--o-composition-table comp-table.qza
# Run ANCOM
qiime composition ancom \
--i-table comp-table.qza \
--m-metadata-file sample-metadata.tsv \
--m-metadata-column Condition \
--o-visualization ancom-Condition.qzv
2. DESeq2 on Raw Counts (R)
library(DESeq2)
library(phyloseq)
# Convert phyloseq to DESeq2
dds <- phyloseq_to_deseq2(ps, ~ Condition)
# Estimate size factors & dispersions
dds <- DESeq(dds)
# Extract results for a contrast
res <- results(dds, contrast=c("Condition","Treatment","Control"))
res_sig <- res[which(res$padj < 0.05), ]
head(res_sig[order(res_sig$log2FoldChange, decreasing=TRUE), ], 10)
# Plot MA‐plot
plotMA(res, ylim=c(-5,5))
Tips & Best Practices
-
Rarefy your feature table (or use compositional methods) only if necessary—many methods handle count differences directly.
-
Always inspect ordination plots colored by biological and technical covariates to detect batch effects.
-
For ANCOM, be mindful of sparsity and compositional biases—validate key taxa with independent methods.