Dadaist2 ITS - quadram-institute-bioscience/gmh-sops GitHub Wiki

Methods

Pre-processing of the sequencing output

The quality profile of the raw reads (in FASTQ format) was assessed using Fastp 0.20.0 (Chen 2018), which was also used to remove reads containing ambiguous bases.

Identification and classification of Amplicon Sequence Variants (ASVs)

The sequencing reads have been analysed using the automated pipeline Dadaist2 (Ansorge 2021) with the --join parameter used for ITS analysis, which embeds the following steps described in this paragraph. The preprocessing relied on SeqFu 1.8 (Telatin 2021) to identify the qualified region of the reads and to remove the locus-specific primers (forward: CTTGGTCATTTAGAGGAAGTAA, reverse: TTACTTCCTCTAAATGACCAAG, for ITS, and forward: CCTACGGGNGGCWGCAG, reverse: GGACTACHVGGGTATCTAATCC for 16S rDNA). The identification of representative sequences has been performed using DADA2 (Callahan 2016), and their taxonomic assignment was performed using against the SILVA database release 138 (Quast 2013), for 16S ribosomal sequences, and the UNITE database release 8.2 for the ITS1 sequences (Nilsson 2019) using the DECIPHER R package (Wright 2016). The multiple alignment of the representative sequences was performed using ClustalO (Sievers 2021) and the guide tree was produced using FastTree (Prince 2009).

Normalization, numerical ecology and plots

Data normalization and diversity were produced using the Rhea scripts (Lagkouvardos 2017), and the final tables exported to be further analysed and plotted using MicrobiomeAnalyst (Dhariwal 2017), and the built-in plotting provided by Dadaist2.

References

Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018 Sep 1;34(17):i884-i890. doi: 10.1093/bioinformatics/bty560. PMID: 30423086;
Telatin A, Fariselli P, Birolo G. SeqFu: A Suite of Utilities for the Robust and Reproducible Manipulation of Sequence Files. Bioengineering (Basel, Switzerland). 2021 May;8(5). DOI: 10.3390/bioengineering8050059. PMID: 34066939
Ansorge R;, Birolo G, James SA, Telatin A. Dadaist2: A Toolkit to Automate and Simplify Statistical Analysis and Plotting of Metabarcoding Experiments. Int. J. Mol. Sci. 2021, 22, 5309. https://doi.org/10.3390/ijms22105309
Callahan BJ, McMurdie PJ, Rosen MJ, et al. DADA2: High-resolution sample inference from Illumina amplicon data. Nature Methods. 2016 Jul;13(7):581-583. DOI: 10.1038/nmeth.3869. PMID: 27214047; PMCID: PMC4927377.
Wright ES (2016). Using DECIPHER v2.0 to Analyze Big Biological Sequence Data in R. The R Journal, 8(1), 352-359.
Cole JR, Chai B, Farris RJ, et al. The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis. Nucleic Acids Research. 2005 Jan;33(Database issue):D294-6. DOI: 10.1093/nar/gki038. PMID: 15608200; PMCID: PMC539992.
Quast C, Pruesse E, Yilmaz P, et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Research. 2013 Jan;41(Database issue):D590-6. DOI: 10.1093/nar/gks1219. PMID: 23193283; PMCID: PMC3531112.
Nilsson RH, Larsson KH, Taylor AFS, et al. The UNITE database for molecular identification of fungi: handling dark taxa and parallel taxonomic classifications. Nucleic Acids Research. 2019 Jan;47(D1):D259-D264. DOI: 10.1093/nar/gky1022. PMID: 30371820; PMCID: PMC6324048.
Sievers F, Higgins DG. The Clustal Omega Multiple Alignment Package. Methods in Molecular Biology (Clifton, N.J.). 2021 ;2231:3-16. DOI: 10.1007/978-1-0716-1036-7_1. PMID: 33289883.
Price MN, Dehal PS, Arkin AP. FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Molecular Biology and Evolution. 2009 Jul;26(7):1641-1650. DOI: 10.1093/molbev/msp077. PMID: 19377059; PMCID: PMC2693737.
Lagkouvardos I, Fischer S, Kumar N, Clavel T. Rhea: a transparent and modular R pipeline for microbial profiling based on 16S rRNA gene amplicons. Peerj. 2017 ;5:e2836. DOI: 10.7717/peerj.2836. PMID: 28097056
Dhariwal A, Chong J, Habib S, et al. MicrobiomeAnalyst: a web-based tool for comprehensive statistical, visual and meta-analysis of microbiome data. Nucleic Acids Research. 2017 Jul;45(W1):W180-W188. DOI: 10.1093/nar/gkx295. PMID: 28449106; PMCID: PMC5570177.

If using cutadapt, cite also

Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads, EMBnet.journal, 2011. DOI:10.14806/ej.17.1.200