Dadaist Qiime - quadram-institute-bioscience/gmh-sops GitHub Wiki

This SOP describes how nf-core/ampliseq and Dadaist2 have been used to analyse a metabarcoding experiment The document contains primers and databases for both 16S and ITS, choose the appropriate wording according to your experiments

Methods

Pre-processing of the sequencing output

The quality profile of the raw reads (in FASTQ format) was assessed using Fastp 0.20.0 (Chen 2018), which was also used to remove reads containing ambiguous bases, while the qualified region for DADA2 was determined using seqfu qual from SeqFu 1.17 (Telatin 2021) with default parameters.

Identification and classification of Amplicon Sequence Variants (ASVs)

The nf-core/ampliseq 2.5 pipeline (Straub 2020, Ewels 2020) was executed setting the locus specific primers to (forward: CTTGGTCATTTAGAGGAAGTAA, reverse: TTACTTCCTCTAAATGACCAAG, for ITS, and forward: CCTACGGGNGGCWGCAG, reverse: GGACTACHVGGGTATCTAATCC for 16S rDNA). The workflow uses DADA2 (Callahan 2016) to identify the Amplicon Sequence Variants (ASVs) using its Qiime2 2022.11.1 (Bolyen 2019) wrapper.

Taxonomic assignment

The taxonomic assignment was performed using against the SILVA database release 138 (Quast 2013), for 16S ribosomal sequences, and the UNITE database release 8.2 for the ITS1 sequences (Nilsson 2019) using the DECIPHER R package (Wright 2016).

Normalization, numerical ecology and plots

Data normalization and diversity were produced using the Rhea scripts (Lagkouvardos 2017), and the final tables exported to be further analysed and plotted using MicrobiomeAnalyst (Dhariwal 2017), and the built-in plotting provided by Dadaist2 (Ansorge 2021).

References

Straub, D., Blackwell, N., Peltzer, A., Nahnsen, S., & Kleindienst, S. Interpretations of Environmental Microbial Community Studies Are Biased by the Selected 16S rRNA (Gene) Amplicon Sequencing Pipeline. (2020) Frontiers in Microbiology, 10.3389/fmicb.2020.550420
Ewels, P. A., Peltzer, A., Fillinger, S., Patel, H., Alneberg, J., Wilm, A., Garcia, M. U., Di Tommaso, P., & Nahnsen, S. The nf-core framework for community-curated bioinformatics pipelines. Nature Biotechnology, (2020). 10.1038/s41587-020-0439-x
Bolyen, E., Rideout, J. R., Dillon, M. R., Bokulich, N. A., Abnet, C. C., A., G., Alexander, H., Alm, E. J., Arumugam, M., Asnicar, F., Bai, Y., Bisanz, J. E., Bittinger, K., Brejnrod, A., Brislawn, C. J., Brown, C. T., Callahan, B. J., Mauricio, A., Chase, J., . . . Caporaso, J. G. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. (2019) Nature Biotechnology 10.1038/s41587-019-0209-9
Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018 10.1093/bioinformatics/bty560
Telatin A, Fariselli P, Birolo G. SeqFu: A Suite of Utilities for the Robust and Reproducible Manipulation of Sequence Files. Bioengineering (Basel, Switzerland). 2021 May;8(5). 10.3390/bioengineering8050059
Ansorge R;, Birolo G, James SA, Telatin A. Dadaist2: A Toolkit to Automate and Simplify Statistical Analysis and Plotting of Metabarcoding Experiments. Int. J. Mol. Sci. 2021, 22, 5309. 10.3390/ijms22105309
Callahan BJ, McMurdie PJ, Rosen MJ, et al. DADA2: High-resolution sample inference from Illumina amplicon data. Nature Methods. 2016 Jul;13(7):581-583. 10.1038/nmeth.3869
Wright ES (2016). Using DECIPHER v2.0 to Analyze Big Biological Sequence Data in R. The R Journal, 8(1), 352-359.
Lagkouvardos I, Fischer S, Kumar N, Clavel T. Rhea: a transparent and modular R pipeline for microbial profiling based on 16S rRNA gene amplicons. Peerj. 2017 ;5:e2836. DOI: 10.7717/peerj.2836. PMID: 28097056
Dhariwal A, Chong J, Habib S, et al. MicrobiomeAnalyst: a web-based tool for comprehensive statistical, visual and meta-analysis of microbiome data. (2017) Nucleic Acids Research. Jul;45(W1):W180-W188. DOI: 10.1093/nar/gkx295. PMID: 28449106; PMCID: PMC5570177.

Databases:

Cole JR, Chai B, Farris RJ, et al. The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis. Nucleic Acids Research. 2005 Jan;33(Database issue):D294-6. DOI: 10.1093/nar/gki038. PMID: 15608200; PMCID: PMC539992.
Quast C, Pruesse E, Yilmaz P, et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Research. 2013 Jan;41(Database issue):D590-6. DOI: 10.1093/nar/gks1219. PMID: 23193283; PMCID: PMC3531112.
Nilsson RH, Larsson KH, Taylor AFS, et al. The UNITE database for molecular identification of fungi: handling dark taxa and parallel taxonomic classifications. Nucleic Acids Research. 2019 Jan;47(D1):D259-D264. DOI: 10.1093/nar/gky1022. PMID: 30371820; PMCID: PMC6324048.

Other tools

Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads, EMBnet.journal, 2011. DOI:10.14806/ej.17.1.200
Sievers F, Higgins DG. The Clustal Omega Multiple Alignment Package. Methods in Molecular Biology (Clifton, N.J.). 2021 ;2231:3-16. DOI: 10.1007/978-1-0716-1036-7_1. PMID: 33289883.
Price MN, Dehal PS, Arkin AP. FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Molecular Biology and Evolution. 2009 Jul;26(7):1641-1650. 10.1093/molbev/msp077