Microbiome Helper 2 Amplicon basic statistics and visualisation - LangilleLab/microbiome_helper GitHub Wiki

Authors: Robyn Wright Modifications by: Based on initial versions by: This part was initially based on the Amplicon SOP V2, but has been used in previous workshops designed by Diana Haider, Robert Beiko, Monica Alvaro Fuss, Juan Santana and Robyn Wright.

Please note: We think that everything here should work, but we are still testing/developing this so use with caution :)

Introduction

Here we will explore examples of downstream analyses used to draw biological insights from these data. The workflow described is integrated into the latest release of QIIME 2 (Quantitative Insights into Microbial Ecology, version 2025.4). As introduced in Module 2, this widely used microbiome bioinformatics platform is built around user-developed software packages called plugins, which operate on QIIME 2 artifact files (with the .qza extension).

You should note that most of the time, people will use either R or Python for further analyses (especially if you want to make some prettier plots :)), but there are some functions that are built into QIIME 2. These are especially useful if it is your first time running a microbiome analysis - they are relatively quick and easy to run, and if you do some of your own custom analyses afterwards (or follow one of our other workflows), then you will have something to make sure that your results aren't looking really wonky.

Documentation for these plugins—including tutorials and additional resources—can be found in the QIIME 2 user documentation. QIIME 2 also offers interpretable visualizations, which can be viewed by opening .qzv files in QIIME2 View

Requirements

It is assumed that you have already run through the Microbiome Helper 2 Amplicon workflow and have all output files from that analysis.

1. Generate rarefaction curves

A key quality control step is to plot rarefaction curves for all of your samples to determine if you performed sufficient sequencing. The below command will generate these plots (make sure you have the correct maximum sequencing depth as per your filtered feature table).

qiime diversity alpha-rarefaction \
  --i-table denoised_output/table_final.qza \
  --p-max-depth X \
  --p-steps 20 \
  --i-phylogeny asvs-tree.qza \
  --o-visualization rarefaction_curves.qzv

2. Calculating diversity metrics and generating ordination plots

Common alpha and beta-diversity metrics can be calculated with a single command in QIIME 2. Ordination plots (such as PCoA plots for weighted UniFrac distances) will be generated automatically as well. This command will also rarefy all samples to the sample sequencing depth before calculating these metrics (X is a placeholder for the lowest reasonable sample depth; samples with depth below this cut-off will be excluded).

qiime diversity core-metrics-phylogenetic \
  --i-table denoised_output/table_final.qza \
  --i-phylogeny asvs-tree.qza \
  --p-sampling-depth X  \
  --m-metadata-file METADATA_FILE.txt \
  --p-n-jobs-or-threads 4 \
  --output-dir diversity

If you are using the tutorial data that we have provided then you will need to run the following commands to make a new metadata file that has the column names that QIIME2 is expecting, in the right place:

cut -f2- arctic_study_metadata_pacbio.txt > arctic_study_metadata.txt
sed -i 's/sample_rename/sampleid/g' arctic_study_metadata.txt

These commands are: (1) removing the first column, and (2) renaming the first column from sample_rename to sampleid.

For alpha diversity visualizations, you will need to produce boxplots comparing the different categories in your metadata file. For example, to create boxplots comparing the Shannon alpha-diversity metric you can use this command:

qiime diversity alpha-group-significance \
  --i-alpha-diversity diversity/shannon_vector.qza \
  --m-metadata-file METADATA_FILE.txt \
  --o-visualization diversity/shannon_compare_groups.qzv

You will need to change this command for the other alpha diversity metrics. You can see the other metrics available by running ls diversity/*_vector.qza.

Note that you can also export (see below) this or any other diversity metric file (ending in .qza) and analyze them with a different program.

If you want to run a PERMANOVA test to calculate differences in beta diversity between groups of interest, you can run the following command. Remember you will need to change it according to which beta diversity metric you are using, and you will also have to specify your metadata category of interest.

qiime diversity beta-group-significance \
  --i-distance-matrix diversity/bray_curtis_distance_matrix.qza \
  --m-metadata-file METADATA_FILE.txt \
  --m-metadata-column METADATA_CATEGORY \
  --o-visualization diversity/bray_curtis_compare_groups.qzv

3. Generate stacked bar chart of taxa relative abundances

Another useful output is the interactive stacked bar-charts of the taxonomic abundances across samples, which can be output with this command:

qiime taxa barplot \
  --i-table denoised_output/table_final.qza \
  --i-taxonomy taxa/classification.qza \
  --m-metadata-file METADATA_FILE.txt \
  --o-visualization taxa/taxa_barplot.qzv

4. Identifying differentially abundant features with ANCOM

ANCOM is one method to test for differences in the relative abundance of features between sample groupings. It is a compositional approach that makes no assumptions about feature distributions. However, it requires that all features have non-zero abundances so a pseudocount first needs to be added (1 is a typical pseudocount choice):

qiime composition add-pseudocount \
  --i-table denoised_output/table_final.qza \
  --p-pseudocount 1 \
  --o-composition-table denoised_output/table_final_pseudocount.qza

Then ANCOM can be run with this command; note that CATEGORY is a placeholder for the text label of your category of interest from the metadata file.

qiime composition ancom \
  --i-table denoised_output/table_final_pseudocount.qza \
  --m-metadata-file METADATA_FILE.txt \
  --m-metadata-column METADATA_CATEGORY \
  --output-dir ancom_output

5. Phylogenetic Robust Aitchison PCA

New approaches to calculating beta diversity are still being invented. For example, Phylogenetic Robust Aitchison PCA (RPCA) aims to account for both the compositional and phyogenetic nature of microbiome data can be run using the gemelli toolbox. If you want to explore this option, you can find instructions for installing gemelli here (you can install it directly into your QIIME2 environment), and follow the tutorial for running it here. Installing and trying out new tools and methods is an important part of bioinformatics!