The dada2 ITS Pipeline for Pair_ended Reads - Yanmei-Zhang/DADA2_SchillingLab GitHub Wiki

Welcome to the dada2_SchillingLab wiki! This page will give you an example of step-by-step illustration of dada2 ITS2 pipeline run in RStudio through the interactive HPC resource. This is helpful if you are new to this pipeline and want to have a graphic view of each step. It assumes that you have set up the work environment as shown in the README.md. You can log into the interactive HPC resource such as NICE session and then open the terminal there. You can also run it in your own computer once you have all the software and packages set up in your own device. But it will run slowly if you have large dataset.

The example shown here was sequenced by Miseq 2x300 run at the University of Minnesota Genomic Center (UMGC) by their dual index protocol. The full length of variable ITS2 fragments can be assembled by merging forward and reverse reads.

Part 1: Load the software from the HPC server

# Load the primer trimming tool that will remove the primers
module load cutadapt
# Check the cutadapt path
which cutadpat
## /panfs/roc/msisoft/cutadapt/1.18/bin/cutadapt

# Load RStudio and start it
module load rstudio/2022.02.3 # use the latest version of rstudio and in this version R/4.1.0 is loaded
rstudio

If you are running it on your own computer, you need install cutadapt.

cutadapt can be installed from here.

Part 2: dada2 ITS2 workflow processed by R/4.1.0

Now you are in Rstudio graphic view. You can navigate to the working folder in Rstudio by clicking on the files tab (In this example: "~/dada2_tutorial"). Then click dada2_pipeline_its2.R to open the R script.

0. Get ready

# Load dada2 and other required packages
library(dada2)
packageVersion("dada2")
## [1] ‘1.20.0’
library(ShortRead)
packageVersion("ShortRead")
## [1] ‘1.50.0’
library(Biostrings)
packageVersion("Biostrings")
## [1] ‘2.60.2’
library(ggplot2)

The example data is located in the tutorial folder you downloaded in the set up step. You can also download here.

# Define data path
data.path <- "~/dada2_tutorial/example/ITS2" # Change it to YOUR data folder
list.files(data.path)
## [1] "Q23_ITS_S185_R1_001.fastq.gz" "Q23_ITS_S185_R2_001.fastq.gz"
## [3] "Q24_ITS_S197_R1_001.fastq.gz" "Q24_ITS_S197_R2_001.fastq.gz"
## [5] "Q25_ITS_S209_R1_001.fastq.gz" "Q25_ITS_S209_R2_001.fastq.gz"
## [7] "Q26_ITS_S221_R1_001.fastq.gz" "Q26_ITS_S221_R2_001.fastq.gz"
## [9] "Q27_ITS_S233_R1_001.fastq.gz" "Q27_ITS_S233_R2_001.fastq.gz"
## [11] "Q28_ITS_S245_R1_001.fastq.gz" "Q28_ITS_S245_R2_001.fastq.gz"

# Generate matched lists of the forward and reverse read files, as well as parsing out the sample name. 
fnFs <- sort(list.files(data.path, pattern = "_R1_001.fastq.gz", full.names = TRUE))
fnRs <- sort(list.files(data.path, pattern = "_R2_001.fastq.gz", full.names = TRUE))

# Set up work path in YOUR directory where you want data; you can also use the same path as the data is in, but it was nice to keep the data folder as original and store the processed data in another folder. 
work.path <- "~/dada2_tutorial/dada2_ITS2" # Change it to YOUR work path

# Set up names of sub directories to stay organized
preprocess.fp <- file.path(work.path, "01_preprocess")
    filtN.fp <- file.path(preprocess.fp, "filtN")
    cut.fp <- file.path(preprocess.fp, "cutadapt")
filter.fp <- file.path(work.path, "02_filter") 
table.fp <- file.path(work.path, "03_tabletax")

1. Pre-processing data for dada2 - remove sequences with Ns, remove primers with cutadapt

1.1 Remove Sequences with Ns

The presence of ambiguous bases (Ns) in the sequencing reads makes accurate mapping of short primer sequences difficult. To solve this problem, the sequences with ambiguous bases (Ns) will be removed.

# “Pre-filter” the sequences just to remove those with Ns, but perform no other filtering.
fnFs.filtN <- file.path(preprocess.fp, "filtN", basename(fnFs)) # Put N-filterd files in filtN/ subdirectory
fnRs.filtN <- file.path(preprocess.fp, "filtN", basename(fnRs))
trimN <- filterAndTrim(fnFs, fnFs.filtN, fnRs, fnRs.filtN, maxN = 0, multithread = TRUE)
head(trimN)
##                              reads.in reads.out
## Q23_ITS_S185_R1_001.fastq.gz    45666     45666
## Q24_ITS_S197_R1_001.fastq.gz    41782     41781
## Q25_ITS_S209_R1_001.fastq.gz    56977     56974
## Q26_ITS_S221_R1_001.fastq.gz    51472     51467
## Q27_ITS_S233_R1_001.fastq.gz    69681     69679
## Q28_ITS_S245_R1_001.fastq.gz    58241     58238

1.2 Remove Primers

The the primers sequences are removed by running cutadapt. First, assign the primers you used to "FWD" and "REV" below. Change it your primers if different.

# Identify primers
FWD <- "TCGATGAAGAACGCAGCG" #5.8SR # Change it to YOUR sequencing primer
REV <- "TCCTCCGCTTATTGATATGC" #ITS4 # Change it to YOUR sequencing primer
# A function to create a list of all orientations of primers
allOrients <- function(primer) {
  # Create all orientations of the input sequence
  require(Biostrings)
  dna <- DNAString(primer)  # The Biostrings works w/ DNAString objects rather than character vectors
  orients <- c(Forward = dna, Complement = complement(dna), Reverse = reverse(dna), 
               RevComp = reverseComplement(dna))
  return(sapply(orients, toString))  # Convert back to character vector
}
# Save the primers orientations to pass along to cutadapt
FWD.orients <- allOrients(FWD)
REV.orients <- allOrients(REV)
FWD.orients
##              Forward           Complement              Reverse 
## "TCGATGAAGAACGCAGCG" "AGCTACTTCTTGCGTCGC" "GCGACGCAAGAAGTAGCT" 
##              RevComp 
## "CGCTGCGTTCTTCATCGA" 
REV.orients
##                Forward             Complement                Reverse 
## "TCCTCCGCTTATTGATATGC" "AGGAGGCGAATAACTATACG" "CGTATAGTTATTCGCCTCCT" 
##                RevComp 
## "GCATATCAATAAGCGGAGGA" 

# A function to count the number of times the primers appear in the forward and reverse read, while considering all possible primer orientations. 
primerHits <- function(primer, fn) {
  # Counts number of reads in which the primer is found
  nhits <- vcountPattern(primer, sread(readFastq(fn)), fixed = FALSE)
  return(sum(nhits > 0))
}

# Check the primers for the first sample before running the cutadapt. 
rbind(FWD.ForwardReads = sapply(FWD.orients, primerHits, fn = fnFs.filtN[1](/Yanmei-Zhang/DADA2_SchillingLab/wiki/1)), 
      FWD.ReverseReads = sapply(FWD.orients, primerHits, fn = fnRs.filtN[1](/Yanmei-Zhang/DADA2_SchillingLab/wiki/1)), 
      REV.ForwardReads = sapply(REV.orients, primerHits, fn = fnFs.filtN[1](/Yanmei-Zhang/DADA2_SchillingLab/wiki/1)), 
      REV.ReverseReads = sapply(REV.orients, primerHits, fn = fnRs.filtN[1](/Yanmei-Zhang/DADA2_SchillingLab/wiki/1)))
##                  Forward Complement Reverse RevComp
## FWD.ForwardReads   40740          0       0       0
## FWD.ReverseReads       2          0       0      25
## REV.ForwardReads       2          0       0      40
## REV.ReverseReads   43881          0       0       0

As expected, the FWD primer is found in the forward reads in its forward orientation, and in some of the reverse reads in its reverse-complement orientation (due to read-through when the ITS region is short). Similarly the REV primer is found with its expected orientations.

Next, we remove Primers by running cutadapt

# Remove Primers
cutadapt <- "/panfs/roc/msisoft/cutadapt/1.18/bin/cutadapt"# CHANGE ME to the cutadapt path on your machine
system2(cutadapt, args = "--version") # Run shell commands from R
## 2.10

if(!dir.exists(cut.fp)) dir.create(cut.fp)
fnFs.cut <- file.path(cut.fp, basename(fnFs))
fnRs.cut <- file.path(cut.fp, basename(fnRs))

FWD.RC <- dada2:::rc(FWD)
REV.RC <- dada2:::rc(REV)

# Trim FWD and the reverse-complement of REV off of R1 (forward reads)
R1.flags <- paste("-g", FWD, "-a", REV.RC) 
# Trim REV and the reverse-complement of FWD off of R2 (reverse reads)
R2.flags <- paste("-G", REV, "-A", FWD.RC) 
# Run Cutadapt
for(i in seq_along(fnFs)) {
  system2(cutadapt, args = c(R1.flags, R2.flags, "-n", 2, # -n 2 required to remove FWD and REV from reads
                             "-o", fnFs.cut[i], "-p", fnRs.cut[i], # output files
                             "-m", 1, # specify the minimum length as some downstream tools such as PlotQualityProfile 
                                      # may have problems with zero-length sequences
                             fnFs.filtN[i], fnRs.filtN[i])) # input files
}

# Count the presence of primers in the first cutadapt-ed sample
rbind(FWD.ForwardReads = sapply(FWD.orients, primerHits, fn = fnFs.cut[1](/Yanmei-Zhang/DADA2_SchillingLab/wiki/1)), 
      FWD.ReverseReads = sapply(FWD.orients, primerHits, fn = fnRs.cut[1](/Yanmei-Zhang/DADA2_SchillingLab/wiki/1)), 
      REV.ForwardReads = sapply(REV.orients, primerHits, fn = fnFs.cut[1](/Yanmei-Zhang/DADA2_SchillingLab/wiki/1)), 
      REV.ReverseReads = sapply(REV.orients, primerHits, fn = fnRs.cut[1](/Yanmei-Zhang/DADA2_SchillingLab/wiki/1)))
##                  Forward Complement Reverse RevComp
## FWD.ForwardReads       0          0       0       0
## FWD.ReverseReads       2          0       0       0
## REV.ForwardReads       2          0       0       0
## REV.ReverseReads       0          0       0       0

Success! Primers are no longer detected in the cutadapted reads.

# Forward and reverse fastq filenames have the format:
cutFs <- sort(list.files(cut.fp, pattern = "_R1_001.fastq.gz", full.names = TRUE))
cutRs <- sort(list.files(cut.fp, pattern = "_R2_001.fastq.gz", full.names = TRUE))

# Extract sample names, assuming filenames have format:
get.sample.name <- function(fname) strsplit(basename(fname), "_")[1](/Yanmei-Zhang/DADA2_SchillingLab/wiki/1)[1]
sample.names <- unname(sapply(cutFs, get.sample.name))
head(sample.names)
## [1] "Q23" "Q24" "Q25" "Q26" "Q27" "Q28"

sample.namesR <- unname(sapply(cutRs, get.sample.name))
if(!identical(sample.names, sample.namesR)) stop("Forward and reverse files do not match.")

2. Run dada2 pipeline

2.1 Inspect Read Quality Profiles

From the dada2 tutorial:

In gray-scale is a heat map of the frequency of each quality score at each base position. The median quality score at each position is shown by the green line, and the quartiles of the quality score distribution by the orange lines. The red line shows the scaled proportion of reads that extend to at least that position.

# Inspect read quality profiles
# If the number of samples is 10 or less, plot them all, otherwise, just plot 10 randomly selected samples
if( length(cutFs) <= 10) {
  fwd_qual_plots <- plotQualityProfile(cutFs)
  rev_qual_plots <- plotQualityProfile(cutRs)
} else {
  rand_samples <- sample(size = 10, 1:length(cutFs)) # grab 20 random samples to plot
  fwd_qual_plots <- plotQualityProfile(paste0(cutFs[rand_samples]))
  rev_qual_plots <- plotQualityProfile(paste0(cutRs[rand_samples]))
}

if(!dir.exists(filter.fp)) dir.create(filter.fp)
ggsave(paste0(filter.fp, "/its2_fwd_qual_10.png"), fwd_qual_plots)
ggsave(paste0(filter.fp, "/its2_rev_qual_10.png"), rev_qual_plots)

fwd_qual_plots

rev_qual_plots

# Put filtered reads into separate sub-directories for big data workflow
subF.fp <- file.path(filter.fp, "filt_F") 
subR.fp <- file.path(filter.fp, "filt_R") 
dir.create(subF.fp)
dir.create(subR.fp)

2.2 Fiter and Trim

Before chosing sequence variants, we want to trim reads where their quality scores begin to drop (the truncLen and truncQ values) and remove any low-quality reads that are left over after we have finished trimming (the maxEE value).

From dada2 tutorial:

If there is only one part of any amplicon bioinformatics workflow on which you spend time considering the parameters, it should be filtering! The parameters ... are not set in stone, and should be changed if they don’t work for your data. If too few reads are passing the filter, increase maxEE (perhaps especially on the reverse reads) and/or reduce truncQ. If quality drops sharply at the end of your reads, reduce truncLen. If your reads are high quality and you want to reduce computation time in the sample inference step, reduce maxEE.

Considerations for your own data:

THESE PARAMETERS ARE NOT OPTIMAL FOR ALL DATASETS. Make sure you try and determine the trim and filtering parameters for your data .

Due to the expected variable read lengths in ITS data you should run this command without the trunclen parameter. See here for more information and appropriate parameters for ITS data.

# Filter and Trim
filtFs <- file.path(subF.fp, basename(cutFs))
filtRs <- file.path(subR.fp, basename(cutRs))
# 
out <- filterAndTrim(cutFs, filtFs, cutRs, filtRs, maxN = 0, maxEE = c(2, 5), 
                     truncQ = 2, minLen = 50, rm.phix = TRUE, compress = TRUE, multithread = TRUE)  # on windows, set multithread = FALSE
head(out)
##                              reads.in reads.out
## Q23_ITS_S185_R1_001.fastq.gz    45666     31514
## Q24_ITS_S197_R1_001.fastq.gz    41781     28024
## Q25_ITS_S209_R1_001.fastq.gz    56974     40868
## Q26_ITS_S221_R1_001.fastq.gz    51467     34088
## Q27_ITS_S233_R1_001.fastq.gz    69679     47830
## Q28_ITS_S245_R1_001.fastq.gz    58238     39014

2.3 Infer Sequence Variants and Merge

In this part of the pipeline dada2 will learn to distinguish error from biological differences using a subset of our data as a training set. After it understands the error rates, we will reduce the size of the dataset by combining all identical sequence reads into "unique sequences". Then, using the dereplicated data and error rates, dada2 will infer the sequence variants (ASVs) in our data. Finally, we will merge the corresponding forward and reverse reads to create a list of the fully denoised sequences and create a sequence table from the result.

Learn the Error Rates

# Set seed to ensure that randomized steps can be replicated
set.seed(100)

# Learn the Error Rates
errF <- learnErrors(filtFs, multithread = TRUE)
## 62630680 total bases in 221338 reads from 6 samples will be used for learning the error rates.
errR <- learnErrors(filtRs, multithread = TRUE)
## 62135669 total bases in 221338 reads from 6 samples will be used for learning the error rates.

errF_plot <- plotErrors(errF, nominalQ = TRUE) 
errR_plot <- plotErrors(errR, nominalQ=TRUE)

ggsave(paste0(filter.fp, "/its2_errF_plot.png"), errF_plot)
ggsave(paste0(filter.fp, "/its2_errR_plot.png"), errR_plot)

We want to make sure that the machine learning algorithm is learning the error rates properly. In the plots below, the red line represents what we should expect the learned error rates to look like for each of the 16 possible base transitions (A->A, A->C, A->G, etc.) and the black line and grey dots represent what the observed error rates are. If the black line and the red lines are very far off from each other, it may be a good idea to increase the nbases parameter. This allows the machine learning algorithm to train on a larger portion of your data and may help improve the fit.

errF_plot errR_plot

Dereplication, Sequence Inference, and Merging of paired-end Reads

In this part of the pipeline, dada2 will make decisions about assigning sequences to ASVs (called "sequence inference"). There is a major parameter option in the core function dada() that changes how samples are handled during sequence inference. The parameter pool = can be set to: pool = FALSE (default), pool = TRUE, or pool = psuedo. For details on parameter choice, please see below, and further information and explanation on the this dada2 tutorial.

Details
pool = FALSE: Sequence information is not shared between samples. Fast processing time, less sensitivity to rare taxa.
pool = psuedo: Sequence information is shared in a separate "prior" step. Intermediate processing time, intermediate sensitivity to rare taxa.
pool = TRUE: Sequence information from all samples is pooled together. Slow processing time, most sensitivity to rare taxa.

Default: SAMPLES NOT POOLED

For simple communities or when you do not need high sensitivity for rare taxa

# make lists to hold the loop output
mergers <- vector("list", length(sample.names))
names(mergers) <- sample.names
ddF <- vector("list", length(sample.names))
names(ddF) <- sample.names
ddR <- vector("list", length(sample.names))
names(ddR) <- sample.names

# For each sample, get a list of merged and denoised sequences
for(sam in sample.names) {
    cat("Processing:", sam, "\n")
    # Dereplicate forward reads
    derepF <- derepFastq(filtFs[sam](/Yanmei-Zhang/DADA2_SchillingLab/wiki/sam))
    # Infer sequences for forward reads
    ddF[sam]] <- dada(derepF, err=errF, multithread=TRUE)
    # Dereplicate reverse reads
    derepR <- derepFastq(filtRs[sam](/Yanmei-Zhang/DADA2_SchillingLab/wiki/sam))
    # Infer sequences for reverse reads
    ddR[sam](/Yanmei-Zhang/DADA2_SchillingLab/wiki/sam) <- dada(derepR, err = errR, multithread = TRUE)
    # Merge reads together
    merger <- mergePairs(ddF[sam](/Yanmei-Zhang/DADA2_SchillingLab/wiki/sam), derepF, ddR[sam](/Yanmei-Zhang/DADA2_SchillingLab/wiki/sam), derepR)
    mergers[sam](/Yanmei-Zhang/DADA2_SchillingLab/wiki/sam) <- merger
}
rm(derepF); rm(derepR)

Alternative: SAMPLES POOLED

For complex communities when you want to preserve rare taxa alternative: swap pool = TRUE with pool = "pseudo"

# Dereplicate identical reads
derepFs <- derepFastq(filtFs, verbose = TRUE)
derepRs <- derepFastq(filtRs, verbose = TRUE)
# Name the derep-class objects by the sample names
names(derepFs) <- sample.names
names(derepRs) <- sample.names

# Sample Inference
dadaFs <- dada(derepFs, err = errF, multithread = TRUE, pool = TRUE)
## 6 samples were pooled: 221338 reads in 41077 unique sequences.
dadaRs <- dada(derepRs, err = errR, multithread = TRUE, pool = TRUE)
## 6 samples were pooled: 221338 reads in 95169 unique sequences.
# Inspecting the returned dada-class object:
dadaFs[1](/Yanmei-Zhang/DADA2_SchillingLab/wiki/1)
## dada-class: object describing DADA2 denoising results
## 467 sequence variants were inferred from 6824 
## input unique sequences.
## Key parameters: OMEGA_A = 1e-40, OMEGA_C = 1e-40, BAND_SIZE = 16

# Merge paired reads
mergers <- mergePairs(dadaFs, derepFs, dadaRs, derepRs, verbose=TRUE)
## 28058 paired-reads (in 334 unique pairings) successfully merged out of 31000 (in 529 pairings) input.
## 23470 paired-reads (in 416 unique pairings) successfully merged out of 26951 (in 616 pairings) input.
## 32289 paired-reads (in 335 unique pairings) successfully merged out of 40464 (in 574 pairings) input.
## 30334 paired-reads (in 433 unique pairings) successfully merged out of 33188 (in 664 pairings) input.
## 39917 paired-reads (in 325 unique pairings) successfully merged out of 47252 (in 534 pairings) input.
## 34275 paired-reads (in 380 unique pairings) successfully merged out of 38488 (in 614 pairings) input.

#Inspect the merger data.frame from the first sample
head(mergers[1](/Yanmei-Zhang/DADA2_SchillingLab/wiki/1))

# Construct Sequence Table
seqtab <- makeSequenceTable(mergers)
dim(seqtab)
## [1]   6 968
# Inspect distribution of sequence lengths
table(nchar(getSequences(seqtab)))

2.4 Remove Chimeras and Summary of Reads

Although dada2 has searched for indel errors and substitutions, there may still be chimeric sequences in our dataset (sequences that are derived from forward and reverse sequences from two different organisms becoming fused together during PCR and/or sequencing). To identify chimeras, we will search for rare sequence variants that can be reconstructed by combining left-hand and right-hand segments from two more abundant "parent" sequences. After removing chimeras, we will use a taxonomy database to train a classifer-algorithm to assign names to our sequence variants.

# Remove Chimeras 
seqtab.nochim <- removeBimeraDenovo(seqtab, method="consensus", multithread=TRUE, verbose=TRUE)
## Identified 257 bimeras out of 968 input sequences.
dim(seqtab.nochim)
## [1]   6 711
table(nchar(getSequences(seqtab.nochim)))
# Print percentage of our seqences that were not chimeric.
sum(seqtab.nochim)/sum(seqtab)
## [1] 0.9159884

# Track reads through the pipeline
getN <- function(x) sum(getUniques(x))
track <- cbind(out, sapply(dadaFs, getN), sapply(dadaRs, getN), sapply(mergers, getN), rowSums(seqtab.nochim))
# If processing a single sample, remove the sapply calls: e.g. replace sapply(dadaFs, getN) with getN(dadaFs)
colnames(track) <- c("input", "filtered", "denoisedF", "denoisedR", "merged", "nonchim")
rownames(track) <- sample.names
head(track)
##     input filtered denoisedF denoisedR merged nonchim
## Q23 45666    31514     31183     31172  28058   23580
## Q24 41781    28024     27484     27179  23470   22340
## Q25 56974    40868     40608     40614  32289   29435
## Q26 51467    34088     33598     33390  30334   28682
## Q27 69679    47830     47488     47452  39917   36936
## Q28 58238    39014     38690     38623  34275   31547

Looks good! We kept the majority of our raw reads, and there is no over-large drop associated with any single step.

From dada2 tutorial: This is a great place to do a last sanity check. Outside of filtering (depending on how stringent you want to be) there should no step in which a majority of reads are lost. If a majority of reads were removed as chimeric, you may need to revisit the removal of primers, as the ambiguous nucleotides in unremoved primers interfere with chimera identification. If a majority of reads failed to merge, the culprit could also be unremoved primers, but could also be due to biological length variation in the sequenced ITS region that sometimes extends beyond the total read length resulting in no overlap.

# Save file
if(!dir.exists(table.fp)) dir.create(table.fp)
write.table(t(seqtab.nochim), paste0(table.fp, "/dada2_its2_counts.txt"), sep="\t", quote=F, row.names=T)
write.table(track , paste0(table.fp, "/dada2_its2_track.txt"), sep="\t", quote=F, row.names = T)

2.5 Assign Taxonomy

Here is a link to download the database we need.

# Assign taxonomy
unite.ref <- "~/dada2_tutorial/db_files/sh_general_release_dynamic_16.10.2022.fasta"  # Change it to location on YOUR device
taxa <- assignTaxonomy(seqtab.nochim, unite.ref, multithread = TRUE, tryRC = TRUE)
## UNITE fungal taxonomic reference detected.
  
taxa.print <- taxa # Removing sequence rownames for display only
rownames(taxa.print) <- NULL
head(taxa.print)
##       Kingdom    Phylum             Class               
## [1,] "k__Fungi" "p__Basidiomycota" "c__Agaricomycetes" 
## [2,] "k__Fungi" "p__Ascomycota"    "c__Sordariomycetes"
## [3,] "k__Fungi" "p__Basidiomycota" "c__Agaricomycetes" 
## [4,] "k__Fungi" "p__Ascomycota"    "c__Leotiomycetes"  
## [5,] "k__Fungi" "p__Ascomycota"    "c__Leotiomycetes"  
## [6,] "k__Fungi" "p__Basidiomycota" "c__Agaricomycetes" 
##      Order               Family             Genus             
## [1,] "o__Auriculariales" "f__Hyaloriaceae"  "g__Stypella"     
## [2,] "o__Sordariales"    "f__Podosporaceae" "g__Podospora"    
## [3,] "o__Auriculariales" "f__Hyaloriaceae"  "g__Stypella"     
## [4,] "o__Helotiales"     "f__Mollisiaceae"  "g__Phialocephala"
## [5,] "o__Helotiales"     "f__Helotiaceae"   "g__Ascocoryne"   
## [6,] "o__Polyporales"    "f__Meruliaceae"   "g__Ceraceomyces" 
##      Species           
## [1,] "s__grilletii"    
## [2,] NA                
## [3,] "s__subgelatinosa"
## [4,] "s__catenospora"  
## [5,] "s__cylichnium"   
## [6,] "s__tessulatus"  

write.table(taxa, paste0(table.fp, "dada2_its2_taxa.txt"), sep="\t", quote=F, row.names = T)

seqtable.taxa<- cbind('#seq'=rownames(taxa), t(seqtab.nochim), taxa)
write.table(seqtable.taxa, paste0(table.fp, "dada2_its2_counts.taxon.species.txt"), sep="\t", quote=F, row.names=F)

save.image(file= paste0(work.path, "/dada2_its2.RData"))

Summary of output files:

dada2_its2_counts.txt - A tab-delimited sequence-by-sample (i.e. OTU) table

dada2_its2_taxa.txt - a tab-demilimited file showing taxonomy information

dada2_its2_counts.taxon.species.txt - a tab-delimited file with sequence as rows, samples as columns and the last few columns showing the taxonomy of the ASV sequence

dada2_its2_track.txt - a tab-delimited file showing the tracking process of reads in each step