Taxonomic and Functional Analysis - egenomics/agb2025 GitHub Wiki
The objective is to transform quality-controlled, contamination-free reads into taxonomically classified features (ASVs) with phylogenetic context and diversity metrics. All outputs live under outputs/run_<run_id>/
.
-
Tool:
dada2
-
Key actions
- Read import – Import quality-controlled demultiplexed paired-end FASTQ files.
-
Quality assessment – Per-sample read quality profiles visualized in
demux.qzv
. - Denoising algorithm DADA2: – Learns error rates, trims reads, and merges paired-ends.
- Feature table generation – Produces an ASV abundance matrix and representative sequences.
-
Outputs
- Imported reads →
qiime2/01_imported_reads/demux.qza
- Denoised feature table →
qiime2/02_denoised_dada2/table.qza
- Representative sequences →
qiime2/02_denoised_dada2/rep-seqs.qza
- Denoising statistics →
qiime2/02_denoised_dada2/denoising-stats.qza
- Imported reads →
-
Tool:
qiime feature-classifier classify-sklearn
with pre-trainedSilva 138
database -
Key actions
- Validation – Verifies path to the pre-trained classifier database.
- Classification – Representative sequences assigned taxonomic labels using confidence-based machine learning.
- Visualization – Generates an interactive taxonomy table for exploration.
-
Outputs
- Taxonomic classifications →
qiime2/04_taxonomy/taxonomy.qza
- Interactive taxonomy viewer →
qiime2/04_taxonomy/taxonomy.qzv
- Taxonomic classifications →
-
Tool:
qiime phylogeny align-to-tree-mafft-fasttree
-
Key actions
-
Multiple sequence alignment –
MAFFT
aligns all representative sequences. - Alignment masking – Highly variable regions filtered to improve tree accuracy.
-
Unrooted tree construction –
FastTree
constructs phylogenetic tree from masked alignment. - Tree rooting – Tree rooted at its midpoint.
-
Multiple sequence alignment –
-
Outputs
- Raw alignment →
qiime2/05_phylogeny/aligned-rep-seqs.qza
- Masked alignment →
qiime2/05_phylogeny/masked-aligned-rep-seqs.qza
- Unrooted tree →
qiime2/05_phylogeny/unrooted-tree.qza
- Rooted tree →
qiime2/05_phylogeny/rooted-tree.qza
- Raw alignment →
- Tool: Custom Python script
-
Key actions
- Sample depth calculation – Computes total read counts per sample from the feature table.
- Threshold selection – Uses the 10th percentile of sample depths as the rarefaction cutoff.
- Retention analysis – Calculates number and percentage of samples above the selected threshold.
-
Outputs
- Rarefaction depth →
rarefaction_threshold/rarefaction_threshold.txt
- Analysis summary →
rarefaction_threshold/rarefaction_summary.txt
- Rarefaction depth →
-
Tool: qiime diversity core-metrics-phylogenetic, qiime diversity alpha-group-significance
-
Key actions
- Measures within-sample diversity - Quantifies diversity, richness, and evenness of microbial communities within individual samples (e.g., Shannon, Faith's Phylogenetic Diversity).
- Statistical Comparison - Performs tests (e.g., Kruskal-Wallis, ANOVA) to assess significant differences in alpha diversity metrics across predefined sample groups (e.g., healthy vs. diseased).
-
Outputs
- Alpha Diversity Metrics - .qza files containing calculated diversity scores for each sample (e.g., shannon_vector.qza, faith_pd_vector.qza).
- Statistical Summary - .qzv visualization of group significance tests (e.g., alpha-group-significance.qzv).
-
Tool:
qiime tools export
with format conversion utilities -
Key actions
-
Feature table conversion – Exports feature table from
.qza
to BIOM format, then converts to tab-separated values for downstream analysis. -
Sequence extraction – Extracts representative sequences from
.qza
to standard FASTA format. - Taxonomy table export – Converts taxonomic classifications to tab-separated format.
- Phylogenetic tree extraction – Exports rooted phylogenetic tree to Newick format for use in phylogenetic analysis software.
- Alpha diversity visualization export – Extracts interactive alpha rarefaction plots and associated data tables.
-
Feature table conversion – Exports feature table from
-
Outputs
- Feature abundance matrix →
qiime2/exported_results/feature_table.tsv
- Representative sequences →
qiime2/exported_results/representative_sequences.fasta
- Taxonomic assignments →
qiime2/exported_results/taxonomy.tsv
- Phylogenetic tree →
qiime2/exported_results/phylogenetic_tree.nwk
- Alpha diversity plots →
qiime2/exported_results/alpha_rarefaction/*
- Feature abundance matrix →