Creating QIIME 2 Taxonomic Classifiers - LangilleLab/microbiome_helper GitHub Wiki

We use the below commands when creating new QIIME2 taxonomic classifiers. These commands are simply based on this QIIME2 tutorial and are listed here for convenience.

This file represents the current commands used to create custom classifiers. This was only done for the ITS classifiers, because the default QIIME 2 classifier works with both 16S and 18S data. To see the previous commands used to generate primer-specific classifiers please see here.

First, the appropriate reference files need to be downloaded, which corresponded to the UNITE (ver8_99_s_04.02.2020) ITS database files (with and without all eukaryotes).

Downloading files

All of the files need to first be downloaded from the UNITE website. I have chosen to use the RefS sequences:

This then gives us two files: sh_qiime_release_04.04.2024.tgz and sh_qiime_release_all_04.04.2024.tgz.

Note that these require you to fill in some information about your research purposes.

These files then need unzipping:

mkdir fungi
mv sh_qiime_release_04.04.2024.tgz fungi/
cd fungi
tar -xvf sh_qiime_release_04.04.2024.tgz

cd ..

mkdir all
mv sh_qiime_release_all_04.04.2024.tgz all/
cd all
tar -xvf sh_qiime_release_all_04.04.2024.tgz

Importing files

All of the database files (FASTAs and taxonomy tables) need to be imported as QIIME 2 artifacts.

mkdir imported_files	

qiime tools import --type 'FeatureData[Sequence]' \
    --input-path fungi/developer/sh_refs_qiime_ver10_99_04.04.2024_dev.fasta \
    --output-path imported_files/fungi_ITS_sh_refs_qiime_ver10_99_04.04.2024_dev.qza
    
qiime tools import --type 'FeatureData[Taxonomy]' --input-format HeaderlessTSVTaxonomyFormat \
    --input-path fungi/developer/sh_taxonomy_qiime_ver10_99_04.04.2024_dev.txt \
    --output-path imported_files/fungi_ITS_sh_taxonomy_sh_taxonomy_qiime_ver10_99_04.04.2024_dev.qza


qiime tools import --type 'FeatureData[Sequence]' \
    --input-path all/developer/sh_refs_qiime_ver10_99_all_04.04.2024_dev.fasta \
    --output-path imported_files/alleuk_ITS_sh_refs_qiime_ver10_99_all_04.04.2024_dev.qza
    
qiime tools import --type 'FeatureData[Taxonomy]' --input-format HeaderlessTSVTaxonomyFormat \
    --input-path all/developer/sh_taxonomy_qiime_ver10_99_all_04.04.2024_dev.txt \
    --output-path imported_files/alleuk_ITS_sh_taxonomy_qiime_ver10_99_all_04.04.2024_dev.qza

Train Naive Bayes Classifiers

Now that the data is imported we can generate the classifiers themselves, which is performed with the below commands. Note the & at the end of each command to enable them to be run in the background. The ITS classifiers are based on the entire ITS region and that two different classifiers are created based on the UNITE database for either all eukaryotes (classifier_alleuk_ITS_sh_refs_qiime_ver10_99_all_04.04.2024_dev.qza) or based on just fungi (classifier_fungi_ITS_sh_taxonomy_sh_taxonomy_qiime_ver10_99_04.04.2024_dev.qza).

mkdir taxa_classifiers

qiime feature-classifier fit-classifier-naive-bayes \
  --i-reference-reads imported_files/fungi_ITS_sh_refs_qiime_ver10_99_04.04.2024_dev.qza \
  --i-reference-taxonomy imported_files/fungi_ITS_sh_taxonomy_sh_taxonomy_qiime_ver10_99_04.04.2024_dev.qza \
  --o-classifier taxa_classifiers/classifier_fungi_ITS_sh_taxonomy_sh_taxonomy_qiime_ver10_99_04.04.2024_dev.qza

qiime feature-classifier fit-classifier-naive-bayes \
  --i-reference-reads imported_files/alleuk_ITS_sh_refs_qiime_ver10_99_all_04.04.2024_dev.qza \
  --i-reference-taxonomy imported_files/alleuk_ITS_sh_taxonomy_qiime_ver10_99_all_04.04.2024_dev.qza \
  --o-classifier taxa_classifiers/classifier_alleuk_ITS_sh_refs_qiime_ver10_99_all_04.04.2024_dev.qza

The taxonomic classifiers are now prepared. It's important that you now run sanity checks on these classifiers to ensure they were created correctly. This is best done by comparing the taxonomic assignments on test input sequences based on these classifiers to the assignments based on an independent approach. I've written a quick pipeline for running these sanity checks specifically for these amplicon regions, which you can see here.