Step 7.2: Differential abundance analyses - shenjean/diversity GitHub Wiki

Filtering data

https://docs.qiime2.org/2024.10/tutorials/filtering/

Identifier-based filtering

Make a tab-separated tsv file (e.g. samples-to-keep.tsv) containing a list of sample IDs that you ant to keep with the header SampleID. Make sure Sample IDs match with IDs in metadata file.

For example,

SampleID
BCB9TL
BCB11H
BCB11T
BCB16SL
BCB16TL
BCB16HL
BCB17SL
BCB17TL
BCB17HL
qiime feature-table filter-samples \
  --i-table bacteriatable.qza \
  --m-metadata-file samples-to-keep.tsv \
  --o-filtered-table id-filtered-table.qza

Metadata-based filtering

Or you can filter by metadata column values:

qiime feature-table filter-samples \
  --i-table bacteriatable.qza \
  --m-metadata-file metadata.txt \
  --p-where "[Species]='Halodule wrightii' AND [Tissue]='Leaf'" \
  --o-filtered-table metadata-filtered-table.qza

ANCOM

This paper provides a comparison of different tools used for microbiome differential abundance. ANCOM is a conservative method. It requires pseudocounts derived from the imputation of ASV count data because ANCOM cannot tolerate frequencies of zero.

This example command collapses ASVs to the genus level

qiime taxa collapse --i-table bacteriatable.qza --i-taxonomy taxonomy.qza --p-level 6 --o-collapsed-table genustable.qza
qiime composition add-pseudocount --i-table genustable.qza --o-composition-table comp-genus-table.qza

qiime composition ancom \
  --i-table comp-genus-table.qza \
  --m-metadata-file metadata.txt \
  --m-metadata-column Species \
  --o-visualization ancom-species.qzv

qiime composition ancom \
  --i-table comp-genus-table.qza \
  --m-metadata-file metadata.txt \
  --m-metadata-column Tissue \
  --o-visualization ancom-tissue.qzv

New method: ANCOM-BC

https://docs.qiime2.org/2024.10/plugins/available/composition/ancombc/

Analyses with external non-QIIME2 software packages

Converting counts to relative frequencies

This is useful for downstream applications like MaAsLin2

qiime feature-table relative-frequency --i-table bacteriatable.qza --o-relative-frequency-table bacteria.relabund.qza

Exporting count tables

Export count table to BIOM format, then convert BIOM format to TSV

qiime tools export --input-path bacteria.relabund.qza --output-path bacteria-relabund-export
biom convert -i bacteria-relabund-export/feature-table.biom -o maaslin-relabund.tsv --to-tsv

MaAsLin2

MaAsLin2 is comprehensive R package for efficiently determining multivariable association between phenotypes, environments, exposures, covariates and microbial meta’omic features. MaAsLin2 relies on general linear models to accommodate most modern epidemiological study designs, including cross-sectional and longitudinal, and offers a variety of data exploration, normalization, and transformation methods.

library(Maaslin2)
library(WGCNA)

setwd("/Volumes/GoogleDrive/My Drive/xxx/")

# Import relative abundances
input_data=read.delim("maaslin-relabund.tsv",header=T,sep="\t",row.names=1)
input_meta=read.delim("maaslin-metadata.tsv",header=T,sep="\t",row.names=1)

# The goodSampleGenes function in library WGCNA checks count data for
# missing entries, entries with weights below a threshold, and zero-variance genes
# The function returns a list of samples and genes that pass criteria on maximum number of missing or low weight values. 
# If necessary, the filtering is iterated.
gsg = goodSamplesGenes(input_data, verbose = 3)

# Check if all genes and samples are good
gsg$allOK

# Filter out "bad" genes if necessary
input_filtered<- input_data[,gsg$goodGenes==TRUE]
gsg_filtered = goodSamplesGenes(input_filtered, verbose = 3)

# Run Maaslin
genus_fit=Maaslin2(input_filtered,input_meta,"maaslin_ASV_output",fixed_effects="Diagnosis")