Step 4: Denoising with DADA2 - shenjean/diversity GitHub Wiki

1. Visualize the read qualities:

qiime demux summarize --i-data pe.qza --o-visualization pe.demux.qzv
  • To visualize the summaries, download the qzv files and upload/drop them to qiime2view. Click on the "Interactive Quality Plot" tab on the resulting qzv file in QIIME 2 view. Example qzv file here
  • Hover your mouse over each read position on the interactive plot to get the quality score summary (parametric seven-number summary) in real-time. From the visualization and the summaries, you can identify start and end positions where quality scores begin to increase or drop. These will guide the trimming parameters for DADA2.
  • Options available for trimming paired-end reads using DADA2 include: --p-trim-left-f, --p-trim-left-r, --p-trunc-len-f, --p-trunc-len-r. Visit this QIIME2 wiki page for more info

2. Make sure DADA2 is installed correctly

The DADA2 on the CIRCE server is not installed correctly and missing a R package (GenomeInfoDbData). If you are using it, run the following hack to install the package and change the library path so R can find the installed package

  • In your home folder (e.g. /home/j/jeanlim) and not the BocaCiegaBay folder, download the GenomeInfoDB data package compatible with R v3.5.1
wget https://bioconductor.org/packages/release/data/annotation/src/contrib/GenomeInfoDbData_1.2.12.tar.gz
  • Make a new folder called Rlib under your home folder (e.g. /home/j/jeanlim), not the BocaCiegaBay folder
mkdir Rlib
  • Then, start the R environment by typing R then enter. You will know you are in the R environment if you see the following:
R version 3.5.1 (2018-07-02) -- "Feather Spray"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-conda_cos6-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
  • While in R, install the GenomeInfoDbData R package into the Rlib folder that you just created
install.packages("GenomeInfoDbData_1.2.12.tar.gz",lib="/home/j/jeanlim/Rlib")
  • If successfully installed, you will see the following output:
inferring 'repos = NULL' from 'pkgs'
* installing *source* package âGenomeInfoDbDataâ ...
** data
** inst
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (GenomeInfoDbData)
  • Exit R by typing q(). You will be prompted whether to save the R workspace, type 'n' (for no) and hit enter. You will be directed back to the command line
  • Now, check where your R library path is located by typing echo $R_LIBS_USER. You should see the following output: /apps/miniconda3/4.5.1/envs/qiime2-2019.10/lib/R/library/. This folder is not writable, therefore you need to add your newly created Rlib folder with the GenomeInfoDbData to the $R_LIBS_USER variable
export R_LIBS_USER=$R_LIBS_USER:/home/j/jeanlim/Rlib
  • Check that your folder has been correctly added to the R library path:
echo $R_LIBS_USER
  • You should see the following output:

/apps/miniconda3/4.5.1/envs/qiime2-2019.10/lib/R/library/:/home/j/jeanlim/Rlib

  • Now, check whether the GenomeInfoDbData package can be loaded correctly in R. Start R again by typing R and hitting enter. Then, load the DADA2 library:
library(dada2)
  • If this is successful, you should not see any error messages. You can quit R by typing q()

3. Run DADA2

In the command line, run the following command:

qiime dada2 denoise-paired --i-demultiplexed-seqs pe.qza --p-n-threads 8 \
--p-trim-left-f 0 --p-trim-left-r 0 --p-trunc-len-r 232 --p-trunc-len-f 0 \
--o-table pe.dada2.qza --o-representative-sequences pe.repseqs.qza \
--o-denoising-stats pe.dada2-stats.qza 

Once done, two new qza files will be generated. xx.repseqs.qza contains sequences of representative sequences identified by DADA2 and xx.dada2.qza is a feature table containing counts of each representative sequence in each sample. You can generate interactive graphic summaries of your qza files:

qiime feature-table tabulate-seqs --i-data pe.repseqs.qza --o-visualization pe.repseqs.qzv
qiime feature-table summarize --i-table pe.dada2.qza --o-visualization pe.dada2.qzv \
--m-sample-metadata-file metadata.txt

From the feature-table summary, you can get the number of features (or Amplicon Sequence Variants - ASVs) per sample by clicking on the Interactive Sample Detail tab. It is especially important to visualize the feature table (pe.dada2.qzv) to get an overview of the number of ASVs found in each sample. This information shows you which samples have sufficient sequences for downstream analysis, and also the number of ASVs to use for subsampling, where you select the sampling depth for downstream alpha and beta diversity analyses.

4. Exporting data from qza files

qiime tools export --input-path pe.dada2.qza --output-path pe_dada2_export
biom convert -i pe_dada2_export/feature-table.biom -o pe_dada2_export/otu_table.txt --to-tsv

If you want to download your ASV sequences in FASTA format, you can similarly use the export command. The command below will create a new folder repseqs_export and save the fasta file in the new folder.

qiime tools export --input-path pe.repseqs.qzv --output-path repseqs_export

5. Filtering data from feature table (Optional)

Depending on your dataset, you may want to filter out low-frequency features using total-frequency based filtering:

qiime feature-table filter-features --i-table pe.dada2.qza --p-min-frequency 10 --o-filtered-table feature-frequency-filtered-table.qza

You may also want to filter out features that only show up in <x number of samples with contingency-based filtering:

qiime feature-table filter-features --i-table feature-frequency-filtered-table.qza --p-min-samples 2 --o-filtered-table sample-contingency-filtered-table.qza

You can also perform other types of filtering. Here is the tutorial: https://docs.qiime2.org/2024.5/tutorials/filtering/