Step 7.2: Differential abundance analyses - shenjean/diversity GitHub Wiki
Filtering data
https://docs.qiime2.org/2024.10/tutorials/filtering/
Identifier-based filtering
Make a tab-separated tsv file (e.g. samples-to-keep.tsv
) containing a list of sample IDs that you ant to keep with the header SampleID
. Make sure Sample IDs match with IDs in metadata file.
For example,
SampleID
BCB9TL
BCB11H
BCB11T
BCB16SL
BCB16TL
BCB16HL
BCB17SL
BCB17TL
BCB17HL
qiime feature-table filter-samples \
--i-table bacteriatable.qza \
--m-metadata-file samples-to-keep.tsv \
--o-filtered-table id-filtered-table.qza
Metadata-based filtering
Or you can filter by metadata column values:
qiime feature-table filter-samples \
--i-table bacteriatable.qza \
--m-metadata-file metadata.txt \
--p-where "[Species]='Halodule wrightii' AND [Tissue]='Leaf'" \
--o-filtered-table metadata-filtered-table.qza
ANCOM
This paper provides a comparison of different tools used for microbiome differential abundance. ANCOM is a conservative method. It requires pseudocounts derived from the imputation of ASV count data because ANCOM cannot tolerate frequencies of zero.
This example command collapses ASVs to the genus level
qiime taxa collapse --i-table bacteriatable.qza --i-taxonomy taxonomy.qza --p-level 6 --o-collapsed-table genustable.qza
qiime composition add-pseudocount --i-table genustable.qza --o-composition-table comp-genus-table.qza
qiime composition ancom \
--i-table comp-genus-table.qza \
--m-metadata-file metadata.txt \
--m-metadata-column Species \
--o-visualization ancom-species.qzv
qiime composition ancom \
--i-table comp-genus-table.qza \
--m-metadata-file metadata.txt \
--m-metadata-column Tissue \
--o-visualization ancom-tissue.qzv
New method: ANCOM-BC
https://docs.qiime2.org/2024.10/plugins/available/composition/ancombc/
Analyses with external non-QIIME2 software packages
Converting counts to relative frequencies
This is useful for downstream applications like MaAsLin2
qiime feature-table relative-frequency --i-table bacteriatable.qza --o-relative-frequency-table bacteria.relabund.qza
Exporting count tables
Export count table to BIOM format, then convert BIOM format to TSV
qiime tools export --input-path bacteria.relabund.qza --output-path bacteria-relabund-export
biom convert -i bacteria-relabund-export/feature-table.biom -o maaslin-relabund.tsv --to-tsv
MaAsLin2
MaAsLin2 is comprehensive R package for efficiently determining multivariable association between phenotypes, environments, exposures, covariates and microbial meta’omic features. MaAsLin2 relies on general linear models to accommodate most modern epidemiological study designs, including cross-sectional and longitudinal, and offers a variety of data exploration, normalization, and transformation methods.
library(Maaslin2)
library(WGCNA)
setwd("/Volumes/GoogleDrive/My Drive/xxx/")
# Import relative abundances
input_data=read.delim("maaslin-relabund.tsv",header=T,sep="\t",row.names=1)
input_meta=read.delim("maaslin-metadata.tsv",header=T,sep="\t",row.names=1)
# The goodSampleGenes function in library WGCNA checks count data for
# missing entries, entries with weights below a threshold, and zero-variance genes
# The function returns a list of samples and genes that pass criteria on maximum number of missing or low weight values.
# If necessary, the filtering is iterated.
gsg = goodSamplesGenes(input_data, verbose = 3)
# Check if all genes and samples are good
gsg$allOK
# Filter out "bad" genes if necessary
input_filtered<- input_data[,gsg$goodGenes==TRUE]
gsg_filtered = goodSamplesGenes(input_filtered, verbose = 3)
# Run Maaslin
genus_fit=Maaslin2(input_filtered,input_meta,"maaslin_ASV_output",fixed_effects="Diagnosis")