Step 6: Taxonomy assignment - shenjean/diversity GitHub Wiki
In QIIME2, taxonomy is assigned to each reference sequence using a pre-trained Naïve Bayesian classifier. Here is a list of available classifiers and references sequences: https://resources.qiime2.org
V4 515F/806R region
There are pre-trained classifier(s) specifically for the V4 515F/806R region. This is available for download in the data resources page. Download the sequences and taxonomy corresponding to the 515F/806R regions. Make sure all files are downloaded to your working folder (e.g. BocaCiegaBay).
- For CIRCE users, download the old classifier trained from Silva 132 99% OTUs from 515F/806R region of sequences from https://docs.qiime2.org/2019.10/data-resources/
wget https://data.qiime2.org/2019.10/common/silva-132-99-515-806-nb-classifier.qza
- For other users, download the classifier: Greengenes2 2022.10 from 515F/806R region of sequences
wget https://data.qiime2.org/classifiers/sklearn-1.4.2/greengenes2/2022.10.backbone.v4.nb.sklearn-1.4.2.qza
Extracting other 16S rRNA gene regions
For other regions or to train your own classifier, download the NR99 full-length sequences (silva-138-99-seqs.qza) and taxonomy (silva-138-99-tax.qza) in the Silva (16S/18S rRNA) section of the data resources page.
wget https://data.qiime2.org/2024.5/common/silva-138-99-seqs.qza
wget https://data.qiime2.org/2024.5/common/silva-138-99-tax.qza
- Now, we're going to extract our target regions using the primer sequences used for library construction and the amplicon length.
- It has been shown that taxonomic classification accuracy improves when a Naive Bayes classifier is trained on only the region of the target sequences that was sequenced (see [Werner et al., 212])(https://www.ncbi.nlm.nih.gov/pubmed/21716311)
- The pipeline below follows the RESCRIPt tutorial
This command will probably take several hours to run. For CIRCE users, with an older version of QIIME2, the best way is to submit and run a job on the server. First, make sure you are logged into CIRCE and make sure you are in the QIIME2 environment. Then, create the following text file with notepad on your own computer, and transfer it to the server. Alternatively, you can use the nano
text editor while logged in to CIRCE. Please change the folder path /home/j/jeanlim/BocaCiegaBay
below to your own folder path on CIRCE. You can name this file qiime2.sh
.
#!/bin/bash
#SBATCH --time=48:00:00
#SBATCH --job-name=qiime2_extract
#SBATCH --output=qiime2_extract.log
cd /home/j/jeanlim/BocaCiegaBay
qiime feature-classifier extract-reads \
--i-sequences silva-138-99-seqs.qza \
--p-f-primer GTGCCAGCMGCCGCGGTAA \
--p-r-primer GGACTACHVGGGTWTCTAAT \
--p-n-jobs 1 \
--o-reads silva-138-v4.qza
Once the file is created, submit the job to the server by typing sbatch qiime2.sh
while logged in to CIRCE server. You can check the status of your queue by typing using the squeue
command, for example squeue -u jeanlim
. Change the username to your own username. An example squeue
output is shown below:
(qiime2-2019.10) [jeanlim@itn2 ~]$ squeue -u jeanlim
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
22526001 circe qiime2_e jeanlim R 6:04 1 svc-3024-9-32
-
The first column shows the JOBID, which is useful for cancelling a job. To cancel a job, type
scancel
followed by the JOBID, for examplescancel 22526001
. -
The fourth column (ST) shows the job status, where PD=pending and R=running.
-
When the job is completed, you will see the output file
silva-138-v4.qza
in your folder. If you don't see the output file but the job is no longer running, you can view the log fileqiime2_extract.log
to check for errors.
For other users, with a newer version of QIIME2
- We’ll set
--p-read-orientation 'forward'
, as the SILVA database is curated to be in the same “forward” orientation. This will allow us to process the data more quickly w/o having to account for mixed-orientation sequences during our primer search.
qiime feature-classifier extract-reads \
--i-sequences silva-138-99-seqs.qza \
--p-f-primer GTGCCAGCMGCCGCGGTAA \
--p-r-primer GGACTACHVGGGTWTCTAAT \
--p-n-jobs 8 \
--p-read-orientation 'forward' \
--o-reads silva-138-v4.qza
Moving forward, you can edit or create copies of the qiime2.sh
to run any of the subsequent QIIME2 commands below. For each run, you will have to replace the old QIIME command each time with a new command. For example, you can replace the qiime feature-classifier extract-reads
with the qiime rescript dereplicate
command below.
Dereplicate extracted sequences
qiime rescript dereplicate \
--i-sequences silva-138-v4.qza --i-taxa silva-138-99-tax.qza --p-mode 'uniq' \
--o-dereplicated-sequences silva-138-v4-uniq.qza \
--o-dereplicated-taxa silva-138-tax-v4-uniq.qza
Training the classifier
Now, let's train the classifier based on the region of interest. This is the basic QIIME2 command:
qiime feature-classifier fit-classifier-naive-bayes \
--i-reference-reads silva-138-v4-uniq.qza --i-reference-taxonomy silva-138-tax-v4-uniq.qza \
--o-classifier q2-v4classifier.qza
The rescript evaluate-fit-classifier
command can be used as a substitute for QIIME2's fit-classifier-naive-bayes
command, with the added functionality of evaluating the accuracy of the classifier. However, this takes a LONG time to run (>35 hours and still running with 128gb RAM and 12 processors).
qiime rescript evaluate-fit-classifier \
--i-sequences silva-138-v4-uniq.qza --i-taxonomy silva-138-tax-v4-uniq.qza \
--o-classifier rescript-v4-classifier.qza \
--o-observed-taxonomy rescript-v4-predicted-taxonomy.qza \
--o-evaluation rescript-v4-classifier-evaluation.qzv
Assign taxonomy to representative sequences
Using a pre-trained or self-trained classifier, assign taxonomy to your representative sequences. This should take 15-30 mins to run. Be patient.
For CIRCE users,
qiime feature-classifier classify-sklearn \
--i-classifier silva-132-99-515-806-nb-classifier.qza \
--i-reads pe.repseqs.qza \
--o-classification taxonomy.qza
For other users,
qiime feature-classifier classify-sklearn \
--i-classifier 2022.10.backbone.v4.nb.sklearn-1.4.2.qza \
--i-reads pe.repseqs.qza \
--o-classification taxonomy.qza
Another alternative is to perform taxonomic classification using BLAST:
wget https://data.qiime2.org/2024.5/common/silva-138-99-seqs.qza
wget https://data.qiime2.org/2024.5/common/silva-138-99-tax.qza
qiime feature-classifier classify-consensus-blast --i-query pe.repseqs.qza \
--i-reference-reads silva-138-99-seqs.qza --i-reference-taxonomy silva-138-99-tax.qza \
--o-classification taxonomy.qza
Get list of taxonomic names and confidence for each feature
qiime metadata tabulate \
--m-input-file taxonomy.qza \
--o-visualization taxonomy.qzv
View the taxonomic composition of each sample with interactive bar plots.
qiime taxa barplot \
--i-table pe.dada2.qza \
--i-taxonomy taxonomy.qza \
--m-metadata-file metadata.txt \
--o-visualization taxa-bar-plots.qzv
Filtering the count table by taxonomy
The command below keeps ASVs with phylum-level annotations and removes ASVs annotated as “archaea”, “chloroplast”, “eukaryota”, and “mitochondria”
If you used silva-132-99-515-806-nb-classifier.qza
to classify your sequences:
qiime taxa filter-table \
--i-table pe.dada2.qza \
--i-taxonomy taxonomy.qza \
--p-include D_1__ \
--p-exclude mitochondria,chloroplast,eukaryota,archaea \
--o-filtered-table bacteriatable.qza
If you used other classifiers to classify your sequences:
qiime taxa filter-table \
--i-table pe.dada2.qza \
--i-taxonomy taxonomy.qza \
--p-include p__ \
--p-exclude mitochondria,chloroplast,eukaryota,archaea \
--o-filtered-table bacteriatable.qza
Check the taxonomy bar plots again
qiime taxa barplot \
--i-table bacteriatable.qza \
--i-taxonomy taxonomy.qza \
--m-metadata-file metadata.txt \
--o-visualization bacteria-taxa-bar-plots.qzv
Export taxonomy file
qiime tools export --input-path taxonomy.qza --output-path taxonomy_export
Summarize ASV abundance of bacteria table
qiime feature-table summarize --i-table bacteriatable.qza --o-visualization bacteriatable.qzv
Collapse features by their taxonomy
This example command collapses ASVs to the genus level
qiime taxa collapse --i-table bacteriatable.qza --i-taxonomy taxonomy.qza --p-level 6 --o-collapsed-table genustable.qza
Summarize ASV abundance of genus table
qiime feature-table summarize --i-table genustable.qza --o-visualization genustable.qzv