Qiime2 Pipeline - meyermicrobiolab/Meyer_Lab_Resources GitHub Wiki

Commands for this pipeline is based off this tutorial for power users and this overview workflow. I highly suggest going through those webpages if you want a little more background on the steps provided below. QIIME2 uses a unique type of .zip file called an Artifact for analysis. All artifacts have a .qza files extension. We can visualize these artifacts using different function calls that perform analysis on our artifacts to make a visualization with a .qzv file extension. We can view our .qzv files through the link. There is a bash script at the end that combines all of the steps if you want to do it that way. You can also go step by step on the command line.

We can break down the Qiime2 pipeline into four main steps

Load the Data
Denoising the Reads
Optional: Merge the tables
Assigning Taxonomy
Make the Robust Aitchison PCA
Port into R

Step 1: Load Data

The first step is to load our data into QIIME2 to make a QIIME artifact. This is the object that we will be performing our analysis on. You can find more information about loading data here. Because our .fastq files are in phred33V2 format we will need to make a "manifest" file for our reads. The manifest is a .tsv (tab separated) in the format below for all of the samples in your folder. Keep in mind the path to your file must be the absolute path. You will also have to do this for each folder of files that you have.

sample-id     forward-absolute-filepath     reverse-absolute-filepath
mySample      $PWD/mySample_R1.fastq.gz     $PWD/mySample_R2.fastq.gz

Ive written this script that you can run in your folders of files to make the manifest.

for file in *.gz
do
echo ${file%%_*} >> list.txt
echo $PWD/$file >> list.txt
done

echo -e "sample-id\tforward-absolute-filepath\treverse-absolute-filepath" > manifest.tsv
awk '(NR+1) % 4' list.txt | paste - - - >> manifest.tsv
rm list.txt

Once you have your manifest files we are ready to import our .fastq files into Qiime with this command.

qiime tools import \
--type 'SampleData[PairedEndSequencesWithQuality]' \
--input-path manifest.tsv \
--output-path imported-seqs.qza \
--input-format PairedEndFastqManifestPhred33V2

You can check that the import worked using

qiime tools peek imported-seqs.qza

Step 2: Denoising the Reads

Most of the DADA2 steps seem to be done with qiime dada2 denoise-paired command. From the documentation 'This method denoises paired-end sequences, dereplicates them, and filters chimeras'. Our output will be 3 artifacts. The table.qza is your FeatureTable[Frequency], rep-seq.qza is a FeatureTable[Sequence]. These are individual OTU tables for each of our sample folders. This should be done for each of your folders of files.

qiime dada2 denoise-paired \
  --i-demultiplexed-seqs imported-seqs.qza \
  --p-trunc-len-f 150 \
  --p-trunc-len-r 150 \
  --o-table table.qza \
  --o-representative-sequences rep-seqs.qza \
  --o-denoising-stats denoising-stats.qza

We can view our table (or ANY table of FeatureTable[Frequency]) by using

qiime feature-table summarize --i-table table.qza [optional --m-sample-metadata-file meta-file.txt] --o-visualization table.qzv

and then uploading our .qzv file to this page.

Step 3: Merge Tables

This step is optional and if you only performed the analysis on one collection of files you should skip this step. We need to merge our tables together so that we can perform the downstream analysis. In QIIME2 when we merge our Feature-Tables we are not allowed to have duplicate rows so we will have to use the built in --p-overlap-method. We can use either 'average', 'error_on_overlapping_feature', 'error_on_overlapping_sample', or 'sum'.

qiime feature-table merge \
  --i-tables table1.qza table2.qza table3.qza (etc.) \
  --p-overlap-method 'average' \
  --o-merged-table all-tables.qza

For merging the sequences we don't have to worry about overlapping.

qiime feature-table merge-seqs \
  --i-data rep-seqs1.qza rep-seqs.qza rep-seqs3.qza (etc.) \
  --o-merged-table all-seqs.qza

Now we are ready to assign taxonomy.

Step 4: Assigning Taxonomy

There are 2 main ways to assign taxonomy in QIIME2, with a reference database, or with a pre-trained classifier. There is already a SILVAv138 classifier available which is ready to use. This is the only step that I was unable to do through the command line. I had to make a separate bash script.

qiime feature-classifier classify-sklearn \
  --i-classifier silva-138-99-nb-classifier.qza \
  --i-reads all-seqs.qza \
  --o-classification taxonomy.qza

qiime metadata tabulate \
  --m-input-file taxonomy.qza \
  --o-visualization taxonomy.qzv

Step 5: Robust Aitchison PCA

Now we are ready to make our Robust Aitchison PCA by using the DEICODE plugin. This plugin takes a FeatureTable[Frequency], and metadata (optional) and returns a beta diversity ordination file that we can then make a bi-plot or a sample plot with emperor plugin.

qiime deicode rpca \
--i-table all-tables.qza \
--p-min-feature-count 10 \
--p-min-sample-count 500 \
--o-biplot ordination.qza \
--o-distance-matrix distance.qza

qiime emperor biplot \
    --i-biplot ordination.qza \
    --m-sample-metadata-file sample-metadata.tsv \
    --m-feature-metadata-file taxonomy.qza \
    --o-visualization biplot.qzv \
    --p-number-of-features 8

qiime emperor plot \
     --i-pcoa ordination.qza \
     --m-metadata-file metadata_tank.txt \ 
     --p-ignore-pcoa-features  True \
     --o-visualization sample-plot.qzvqiime emperor plot

Step 6: Port Into R

Porting data into R is made pretty simple with the plugin, qiime2R, which takes our .qza files and unzips them for importing into R while maintaining their metadata (the whole point of Qiime2). To install the plugin make sure that you have devtools installed into R. Then we can install using

if (!requireNamespace("devtools", quietly = TRUE)){install.packages("devtools")}
devtools::install_github("jbisanz/qiime2R") # current version is 0.99.20

the plugin reads qiime artifacts with read_qza, unzips them, and stores their metadata and data in a names list with these categories

artifact$data - the raw data ex OTU table as matrix or tree in phylo format
artifact$uuid - the unique identifer of the artifact
artifact$type - the semantic type of the object (ex FeatureData[Sequence])
artifact$format - the format of the qiime artifact
artifact$provenance - information tracking how the object was created
artifact$contents - a table of all the files contained within the artifact and their file size
artifact$version - the reported version for the artifact, a warning error may be thrown if a new version.

You can access this information by accessing the table as you would any list in R. We can do this with any .qza file including our ordination.qza files. This means we can also make our sample-plot without ever making a phyloseq object!

library(qiime2R)
#Reading in the FeatureTable[Frequency]
tables<-read_qza("all-tables.qza")
tables<-tables$data

## Reading in Metadata I do regularly
meta <- read.csv("metadata_tank.txt",sep="\t")

# Read in taxonomy
taxonomy<-read_qza("taxonomy.qza")
taxonomy<-parse_taxonomy(Q_taxonomy$data)

# port directly into a phyloseq object
ps<-qza_to_phyloseq(
  features="all-tables.qza",
  taxonomy = "taxonomy.qza",
  metadata = "metadata_tank.txt"
)

# grab the biplot information, the data here is a list of 4 the Eigvals, Proportions Explained, Species, and Vectors
# just use whatever you need
ordination <- read_qza("ordination.qza")
ordination <- ordination$data
eigvals <- ordination$Eigvals
propExp<-ordination$ProportionExplained
species <- ordination$Species
vectors <- ordination$Vectors

Now you can continue to use your data as a phyloseq object in R.

Bash Script

#!/bin/bash
#SBATCH --job-name=qiime_%j
#SBATCH --output=qiime_%j.log
#SBATCH --error=qimme_%j.err
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH [email protected]
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
#SBATCH --mem-per-cpu=50gb
#SBATCH --time=4-00:00:00
#SBATCH --account=juliemeyer
#SBATCH --qos=juliemeyer-b

module load qiime2

# STEP 1 & 2: MAKE MANIFEST & IMPORT DATA & DENOISE READS  ############################################
# This will make the manifest, import, and denoise your data in the same step.



for directory in */
do
	cd $directory

	for file in *.gz
	do
		echo (maybe remove this  ill check later)echo ${file%%_*} >> list.txt
		echo $PWD/$file >> list.txt
		done

	echo -e "sample-id\tforward-absolute-filepath\treverse-absolute-filepath" > manifest.tsv
        awk '(NR+1) % 4' list.txt | paste - - - >> manifest.tsv
        rm list.txt

	qiime tools import \
		--type 'SampleData[PairedEndSequencesWithQuality]' \
		--input-path manifest.tsv \
		--output-path imported-seqs.qza \
		--input-format PairedEndFastqManifestPhred33V2

	qiime dada2 denoise-paired \
		--i-demultiplexed-seqs imported-seqs.qza \
		--p-trunc-len-f 150 \
		--p-trunc-len-r 150 \
		--o-table table.qza \
		--o-representative-sequences rep-seqs.qza \
		--o-denoising-stats denoising-stats.qza
	cd ..
done



# STEP 3 : MERGE THE TABLES ######################################
# change the -i-tables/data to correct filenames and filepaths


qiime feature-table merge \
	--i-tables cutadapt_NS1968R_Tank/table.qza cutadapt_NS2051_Tank/table.qza cutadapt_NS1995_Tank/table.qza (etc.) \
	--p-overlap-method 'average' \
	--o-merged-table all-tables.qza

qiime feature-table merge-seqs \
	--i-data cutadapt_NS1968R_Tank/rep-seqs.qza cutadapt_NS2051_Tank/rep-seqs.qza cutadapt_NS1995_Tank/rep-seqs.qza (etc.) \
	--o-merged-data all-seqs.qza


# STEP 4: ASSIGN THE TAXONOMY ####################################
# make sure to download the classifier from https://docs.qiime2.org/2021.4/data-resources/

qiime feature-classifier classify-sklearn \
	--i-classifier silva-138-99-nb-classifier.qza \
	--i-reads all-seqs.qza \
	--o-classification taxonomy.qza

qiime metadata tabulate \
	--m-input-file taxonomy.qza \
	--o-visualization taxonomy.qzv

# STEP 5: Robust Aitchison PCA ####################################

qiime deicode rpca \
	--i-table all-tables.qza \
	--p-min-feature-count 10 \
	--p-min-sample-count 500 \
	--o-biplot ordination.qza \
	--o-distance-matrix distance.qza

qiime emperor biplot \
     --i-biplot ordination.qza \
     --m-sample-metadata-file metadata.txt \
     --m-feature-metadata-file taxonomy.qza \
     --o-visualization biplot.qzv \
     --p-number-of-features 8

#Sample PCA plot that only plots one value for the community
qiime emperor plot \
     --i-pcoa ordination.qza \
     --m-metadata-file metadata.txt \ 
     --p-ignore-pcoa-features  True \
     --o-visualization sample-plot.qzv