Step 7.2: Differential abundance analyses - shenjean/diversity GitHub Wiki
Filtering data
https://docs.qiime2.org/2024.10/tutorials/filtering/
Identifier-based filtering
Make a tab-separated tsv file (e.g. samples-to-keep.tsv) containing a list of sample IDs that you ant to keep with the header SampleID. Make sure Sample IDs match with IDs in metadata file.
For example,
SampleID
BCB9TL
BCB11H
BCB11T
BCB16SL
BCB16TL
BCB16HL
BCB17SL
BCB17TL
BCB17HL
qiime feature-table filter-samples \
--i-table bacteriatable.qza \
--m-metadata-file samples-to-keep.tsv \
--o-filtered-table id-filtered-table.qza
Metadata-based filtering
Or you can filter by metadata column values:
qiime feature-table filter-samples \
--i-table bacteriatable.qza \
--m-metadata-file metadata.txt \
--p-where "[Species]='Halodule wrightii' AND [Tissue]='Leaf'" \
--o-filtered-table metadata-filtered-table.qza
ANCOMBC2
This paper provides a comparison of different tools used for microbiome differential abundance. ANCOM is a conservative method.
This example command collapses ASVs to the genus level, if needed
qiime taxa collapse --i-table bacteriatable.qza --i-taxonomy taxonomy.qza --p-level 6 --o-collapsed-table genustable.qza
qiime composition ancombc2 \
--i-table table.qza \
--m-metadata-file metadata.txt \
--p-fixed-effects-formula Type \
--p-reference-levels 'Type::Field' \
--o-ancombc2-output ancombc2-results.qza
qiime composition ancombc2-visualizer \
--i-data ancombc2-results.qza --i-taxonomy taxonomy.qza \
--o-visualization ancombc2-results.qzv
Analyses with external non-QIIME2 software packages
Converting counts to relative frequencies
This is useful for downstream applications like MaAsLin2
qiime feature-table relative-frequency --i-table bacteriatable.qza --o-relative-frequency-table bacteria.relabund.qza
Exporting count tables
Export count table to BIOM format, then convert BIOM format to TSV
qiime tools export --input-path bacteria.relabund.qza --output-path bacteria-relabund-export
biom convert -i bacteria-relabund-export/feature-table.biom -o maaslin-relabund.tsv --to-tsv
MaAsLin3
MaAsLin3 is comprehensive R package for efficiently determining multivariable association between phenotypes, environments, exposures, covariates and microbial meta’omic features. MaAsLin2 relies on general linear models to accommodate most modern epidemiological study designs, including cross-sectional and longitudinal, and offers a variety of data exploration, normalization, and transformation methods.
library(Maaslin2)
library(WGCNA)
setwd("/Volumes/GoogleDrive/My Drive/xxx/")
# Import relative abundances
input_data=read.delim("maaslin-relabund.tsv",header=T,sep="\t",row.names=1)
input_meta=read.delim("maaslin-metadata.tsv",header=T,sep="\t",row.names=1)
# The goodSampleGenes function in library WGCNA checks count data for
# missing entries, entries with weights below a threshold, and zero-variance genes
# The function returns a list of samples and genes that pass criteria on maximum number of missing or low weight values.
# If necessary, the filtering is iterated.
gsg = goodSamplesGenes(input_data, verbose = 3)
# Check if all genes and samples are good
gsg$allOK
# Filter out "bad" genes if necessary
input_filtered<- input_data[,gsg$goodGenes==TRUE]
gsg_filtered = goodSamplesGenes(input_filtered, verbose = 3)
# Run Maaslin
genus_fit=Maaslin2(input_filtered,input_meta,"maaslin_ASV_output",fixed_effects="Diagnosis")