Step 7.2: Differential abundance analyses - shenjean/diversity GitHub Wiki

Filtering data

https://docs.qiime2.org/2024.10/tutorials/filtering/

Identifier-based filtering

Make a tab-separated tsv file (e.g. samples-to-keep.tsv) containing a list of sample IDs that you ant to keep with the header SampleID. Make sure Sample IDs match with IDs in metadata file.

For example,

SampleID
BCB9TL
BCB11H
BCB11T
BCB16SL
BCB16TL
BCB16HL
BCB17SL
BCB17TL
BCB17HL
qiime feature-table filter-samples \
  --i-table bacteriatable.qza \
  --m-metadata-file samples-to-keep.tsv \
  --o-filtered-table id-filtered-table.qza

Metadata-based filtering

Or you can filter by metadata column values:

qiime feature-table filter-samples \
  --i-table bacteriatable.qza \
  --m-metadata-file metadata.txt \
  --p-where "[Species]='Halodule wrightii' AND [Tissue]='Leaf'" \
  --o-filtered-table metadata-filtered-table.qza

ANCOMBC2

This paper provides a comparison of different tools used for microbiome differential abundance. ANCOM is a conservative method.

This example command collapses ASVs to the genus level, if needed

qiime taxa collapse --i-table bacteriatable.qza --i-taxonomy taxonomy.qza --p-level 6 --o-collapsed-table genustable.qza
qiime composition ancombc2 \
  --i-table table.qza \
  --m-metadata-file metadata.txt \
  --p-fixed-effects-formula Type \
  --p-reference-levels 'Type::Field' \
  --o-ancombc2-output ancombc2-results.qza

qiime composition ancombc2-visualizer \
  --i-data ancombc2-results.qza --i-taxonomy taxonomy.qza \
  --o-visualization ancombc2-results.qzv

Analyses with external non-QIIME2 software packages

Converting counts to relative frequencies

This is useful for downstream applications like MaAsLin2

qiime feature-table relative-frequency --i-table bacteriatable.qza --o-relative-frequency-table bacteria.relabund.qza

Exporting count tables

Export count table to BIOM format, then convert BIOM format to TSV

qiime tools export --input-path bacteria.relabund.qza --output-path bacteria-relabund-export
biom convert -i bacteria-relabund-export/feature-table.biom -o maaslin-relabund.tsv --to-tsv

MaAsLin3

MaAsLin3 is comprehensive R package for efficiently determining multivariable association between phenotypes, environments, exposures, covariates and microbial meta’omic features. MaAsLin2 relies on general linear models to accommodate most modern epidemiological study designs, including cross-sectional and longitudinal, and offers a variety of data exploration, normalization, and transformation methods.

library(Maaslin2)
library(WGCNA)

setwd("/Volumes/GoogleDrive/My Drive/xxx/")

# Import relative abundances
input_data=read.delim("maaslin-relabund.tsv",header=T,sep="\t",row.names=1)
input_meta=read.delim("maaslin-metadata.tsv",header=T,sep="\t",row.names=1)

# The goodSampleGenes function in library WGCNA checks count data for
# missing entries, entries with weights below a threshold, and zero-variance genes
# The function returns a list of samples and genes that pass criteria on maximum number of missing or low weight values. 
# If necessary, the filtering is iterated.
gsg = goodSamplesGenes(input_data, verbose = 3)

# Check if all genes and samples are good
gsg$allOK

# Filter out "bad" genes if necessary
input_filtered<- input_data[,gsg$goodGenes==TRUE]
gsg_filtered = goodSamplesGenes(input_filtered, verbose = 3)

# Run Maaslin
genus_fit=Maaslin2(input_filtered,input_meta,"maaslin_ASV_output",fixed_effects="Diagnosis")