Taxonomic and Functional Analysis - egenomics/agb2025 GitHub Wiki

The objective is to transform quality-controlled, contamination-free reads into taxonomically classified features (ASVs) with phylogenetic context and diversity metrics. All outputs live under outputs/run_<run_id>/.


Denoising with DADA2

  • Tool: dada2
  • Key actions
    1. Read import – Import quality-controlled demultiplexed paired-end FASTQ files.
    2. Quality assessment – Per-sample read quality profiles visualized in demux.qzv.
    3. Denoising algorithm DADA2: – Learns error rates, trims reads, and merges paired-ends.
    4. Feature table generation – Produces an ASV abundance matrix and representative sequences.
  • Outputs
    • Imported reads → qiime2/01_imported_reads/demux.qza
    • Denoised feature table → qiime2/02_denoised_dada2/table.qza
    • Representative sequences → qiime2/02_denoised_dada2/rep-seqs.qza
    • Denoising statistics → qiime2/02_denoised_dada2/denoising-stats.qza

Taxonomy Assignment

  • Tool: qiime feature-classifier classify-sklearn with pre-trained Silva 138 database
  • Key actions
    1. Validation – Verifies path to the pre-trained classifier database.
    2. Classification – Representative sequences assigned taxonomic labels using confidence-based machine learning.
    3. Visualization – Generates an interactive taxonomy table for exploration.
  • Outputs
    • Taxonomic classifications → qiime2/04_taxonomy/taxonomy.qza
    • Interactive taxonomy viewer → qiime2/04_taxonomy/taxonomy.qzv

Phylogenetic Tree

  • Tool: qiime phylogeny align-to-tree-mafft-fasttree
  • Key actions
    1. Multiple sequence alignmentMAFFT aligns all representative sequences.
    2. Alignment masking – Highly variable regions filtered to improve tree accuracy.
    3. Unrooted tree constructionFastTree constructs phylogenetic tree from masked alignment.
    4. Tree rooting – Tree rooted at its midpoint.
  • Outputs
    • Raw alignment → qiime2/05_phylogeny/aligned-rep-seqs.qza
    • Masked alignment → qiime2/05_phylogeny/masked-aligned-rep-seqs.qza
    • Unrooted tree → qiime2/05_phylogeny/unrooted-tree.qza
    • Rooted tree → qiime2/05_phylogeny/rooted-tree.qza

Rarefaction (Optional)

  • Tool: Custom Python script
  • Key actions
    1. Sample depth calculation – Computes total read counts per sample from the feature table.
    2. Threshold selection – Uses the 10th percentile of sample depths as the rarefaction cutoff.
    3. Retention analysis – Calculates number and percentage of samples above the selected threshold.
  • Outputs
    • Rarefaction depth → rarefaction_threshold/rarefaction_threshold.txt
    • Analysis summary → rarefaction_threshold/rarefaction_summary.txt

Alpha Diversity

  • Tool: qiime diversity core-metrics-phylogenetic, qiime diversity alpha-group-significance

  • Key actions

    1. Measures within-sample diversity - Quantifies diversity, richness, and evenness of microbial communities within individual samples (e.g., Shannon, Faith's Phylogenetic Diversity).
    2. Statistical Comparison - Performs tests (e.g., Kruskal-Wallis, ANOVA) to assess significant differences in alpha diversity metrics across predefined sample groups (e.g., healthy vs. diseased).
  • Outputs

    1. Alpha Diversity Metrics - .qza files containing calculated diversity scores for each sample (e.g., shannon_vector.qza, faith_pd_vector.qza).
    2. Statistical Summary - .qzv visualization of group significance tests (e.g., alpha-group-significance.qzv).

Results Export

  • Tool: qiime tools export with format conversion utilities
  • Key actions
    1. Feature table conversion – Exports feature table from .qza to BIOM format, then converts to tab-separated values for downstream analysis.
    2. Sequence extraction – Extracts representative sequences from .qza to standard FASTA format.
    3. Taxonomy table export – Converts taxonomic classifications to tab-separated format.
    4. Phylogenetic tree extraction – Exports rooted phylogenetic tree to Newick format for use in phylogenetic analysis software.
    5. Alpha diversity visualization export – Extracts interactive alpha rarefaction plots and associated data tables.
  • Outputs
    • Feature abundance matrix → qiime2/exported_results/feature_table.tsv
    • Representative sequences → qiime2/exported_results/representative_sequences.fasta
    • Taxonomic assignments → qiime2/exported_results/taxonomy.tsv
    • Phylogenetic tree → qiime2/exported_results/phylogenetic_tree.nwk
    • Alpha diversity plots → qiime2/exported_results/alpha_rarefaction/*

⚠️ **GitHub.com Fallback** ⚠️