IonTorrent Amplicon SOP v1 (qiime2) - LangilleLab/microbiome_helper GitHub Wiki

This standard operating procedure (SOP) is based on QIIME2 and is meant for users who want to quickly run their IonTorrent amplicon data through the Microbiome Helper virtual box image and for internal use. Note that this workflow simply adapts our current Illumina amplicon workflows by altering the first steps to be compatible with the IonTorrent datatype, therefore you will need to use and familiarize yourself with the Illumina workflow of your choice on the right menu bar in order to complete processing.

Also note that, given the unpopularity of IonTorrent versus Illumina for amplicon data, this SOP has not been extensively tested and could still contain minor problems to be addressed (please contact us if you find any).

If you use this workflow make sure to keep track of the commands you use locally as this page will be updated over time (see "revisions" above for earlier versions).

Requirements

This workflow assumes that you have QIIME2 installed in a conda environment (the appropriate version matching the Illumina workflow you will continue).

This workflow also assumes that the input is raw IonTorrent data in demultiplexed FASTQ format located within a folder called raw_data. The filenames can be almost anything you wish (contrary to most QIIME2 importing) since you are going to use a "manifest file" to list each file.

1. First steps

Steps 1.1 to 1.4

As per the Illumina amplicon SOP.

1.5 Import FASTQs as QIIME 2 artifact

Note that the client datasets we have seen had reads already trimmed of adapters+barcodes+primers directly off the IonTorrent software/machine. If this is your case, then proceed here below. However, if not, you will have to employ the strategy at the beginning of our PacBio SOP (Steps 1.5-1.7) in order to resolve the mixed orientation problem, that is also common to IonTorrent, followed by trimming outside QIIME2 before importing.

The trimmed reads are imported into the QIIME 2 "artifact" file format (with the extension QZA). The slight difference here compared to standard Illumina file importing is that you need to use a "manifest" file - consult the QIIME2 documentation about preparing it, but essentially it is just a tab-delimited text file containing the sample names + absolute path to each file.

mkdir reads_qza

qiime tools import \
    --type SampleData[SequencesWithQuality] \
    --input-path IonTorrentManifest.tsv \
    --output-path reads_qza/reads_trimmed.qza \
    --input-format SingleEndFastqManifestPhred33V2

1.6 Trim primers with cutadapt

As mentioned above, skip this step with data that was received already trimmed, otherwise complete it as per the Illumina amplicon SOP.

1.7 Summarize trimmed FASTQs

As per the Illumina amplicon SOP.

2. Denoising the reads into amplicon sequence variants

At this stage, the main 2 pipelines you can use are based on either deblur or DADA2. Below we will describe the commands for running DADA2 which we have found to perform better with IonTorrent data in the small amount of testing we had done (plus forum posts of other's experiences).

2.1 Filter out low-quality reads

This command will filter out low-quality reads based on the default options.

qiime quality-filter q-score \
   --i-demux reads_qza/reads_trimmed.qza \
   --o-filter-stats filt_stats.qza \
   --o-filtered-sequences reads_qza/reads_trimmed_filt.qza

Note that you may encounter a more significant amount of read loss here for IonTorrent data due to its much poorer quality overall compared to Illumina data. You can visualize the state of your filtered reads using the command below:

qiime demux summarize \
   --i-data reads_qza/reads_trimmed_filt.qza \
   --o-visualization reads_qza/reads_trimmed_filt_summary.qzv

2.2 Running DADA2

Run the DADA2 workflow to correct reads and get amplicon sequence variants (ASVs). Note that we are using the --p-trim-left 15 parameter as recommended for IonTorrent data due to the lower quality of the beginning bases of those reads. You will probably want to increase the number of threads used below to the maximum your system has available.

qiime dada2 denoise-single \
   --i-demultiplexed-seqs reads_qza/reads_trimmed_filt.qza \
   --p-trunc-len 0 \
   --p-trim-left 15 \
   --p-max-ee 3 \
   --p-n-threads 4 \
   --output-dir dada2_output

2.2 Summarizing DADA2 output

Once a denoising pipeline has been run you can summarize the output table with the below command, which will create a visualization artifact for you to view.

qiime feature-table summarize \
   --i-table dada2_output/table.qza \
   --o-visualization dada2_output/dada2_table_summary.qzv

You should also take a look at the read count table to see how many reads were retained at each step of the DADA2 pipeline:

qiime tools export --input-path dada2_output/denoising_stats.qza --output-path dada2_output
mv dada2_output/stats.tsv dada2_output/dada2_stats.tsv

3. Assign taxonomy to ASVs

As IonTorrent sequencing does not produce reads in a consistent F or R direction, unlike Illumina, the resulting ASV files have reads in both the 5'-3' plus 3'-5' directions...these "mixed orientation" files then break most pipelines in that many tools cannot read both orientations at the same time (ie: some tools can search the other orientation, but would assume all reads would be reversed if it found any). The easiest way to resolve this problem for taxonomic assignment, without resorting to complex scripting, is to simply use the VSEARCH classifier instead of the typical scikit-learn method we use for our typical Illumina SOP. Consult the documentation for more details on the parameters and preparation of the reference files required (although they are much more straightforward than the scikit-learn files) - the reference files listed below (16S only) are for our internal files+paths and you will have to change them to correspond to your system:

qiime feature-classifier classify-consensus-vsearch \
   --i-query dada2_output/representative_sequences.qza \
   --i-reference-reads /home/shared/rRNA_db/ForVSEARCH/silva_132_99_16S.qza \
   --i-reference-taxonomy /home/shared/rRNA_db/ForVSEARCH/silva_132_99_16S_majority_taxonomy_7_levels.qza \
   --p-threads $NCORES \
   --output-dir taxa

As with all QZA files, you can export the output file to take a look at the classifications and confidence scores:

qiime tools export --input-path taxa/classification.qza --output-path taxa

Steps 4 and Onward

Continue on with the standard Illumina amplicon SOP, keeping in mind the following notes:

Remember to modify the expected filenames and folders from "deblur..." to "dada2..." throughout.
There is no "bleed-through" phenomenon in IonTorrent sequencing (addressed in Step 4.1) therefore you may have to adjust the levels to which you filter out rare sequences. On the other hand, IonTorrent quality is poorer, therefore the two may balance each other out and you could still proceed with the same 0.1% filter level.