nfcoremag taxonomy - quadram-institute-bioscience/gmh-sops GitHub Wiki

nf-core/MAG taxonomy

Files

The Taxonomy output is a directory with the taxonomy classification of the bins. In this example three metagenomes have been provided to be assembled and binned.

sub-mag/Taxonomy/GTDB-Tk/
 ├── MEGAHIT 
 │   ├── E026BW3_S18
 │   │   ├── gtdbtk.MEGAHIT-E026BW3_S18.ar122.markers_summary.tsv
 │   │   ├── gtdbtk.MEGAHIT-E026BW3_S18.bac120.classify.tree.gz
 │   │   ├── gtdbtk.MEGAHIT-E026BW3_S18.bac120.filtered.tsv
 │   │   ├── gtdbtk.MEGAHIT-E026BW3_S18.bac120.markers_summary.tsv
 │   │   ├── gtdbtk.MEGAHIT-E026BW3_S18.bac120.msa.fasta.gz
 │   │   ├── gtdbtk.MEGAHIT-E026BW3_S18.bac120.summary.tsv
 │   │   ├── gtdbtk.MEGAHIT-E026BW3_S18.bac120.user_msa.fasta
 │   │   ├── gtdbtk.MEGAHIT-E026BW3_S18.failed_genomes.tsv
 │   │   ├── gtdbtk.MEGAHIT-E026BW3_S18.log
 │   │   └── gtdbtk.MEGAHIT-E026BW3_S18.warnings.log
 │   ├── E044T3_S56
 │   │   ├── gtdbtk.MEGAHIT-E044T3_S56.ar122.markers_summary.tsv
 │   │   ├── gtdbtk.MEGAHIT-E044T3_S56.bac120.classify.tree.gz
 │   │   ├── gtdbtk.MEGAHIT-E044T3_S56.bac120.filtered.tsv
 │   │   ├── gtdbtk.MEGAHIT-E044T3_S56.bac120.markers_summary.tsv
 │   │   ├── gtdbtk.MEGAHIT-E044T3_S56.bac120.msa.fasta.gz
 │   │   ├── gtdbtk.MEGAHIT-E044T3_S56.bac120.summary.tsv
 │   │   ├── gtdbtk.MEGAHIT-E044T3_S56.bac120.user_msa.fasta
 │   │   ├── gtdbtk.MEGAHIT-E044T3_S56.failed_genomes.tsv
 │   │   ├── gtdbtk.MEGAHIT-E044T3_S56.log
 │   │   └── gtdbtk.MEGAHIT-E044T3_S56.warnings.log
 │   └── E069T3_S57
 │       ├── gtdbtk.MEGAHIT-E069T3_S57.ar122.markers_summary.tsv
 │       ├── gtdbtk.MEGAHIT-E069T3_S57.bac120.classify.tree.gz
 │       ├── gtdbtk.MEGAHIT-E069T3_S57.bac120.filtered.tsv
 │       ├── gtdbtk.MEGAHIT-E069T3_S57.bac120.markers_summary.tsv
 │       ├── gtdbtk.MEGAHIT-E069T3_S57.bac120.msa.fasta.gz
 │       ├── gtdbtk.MEGAHIT-E069T3_S57.bac120.summary.tsv
 │       ├── gtdbtk.MEGAHIT-E069T3_S57.bac120.user_msa.fasta
 │       ├── gtdbtk.MEGAHIT-E069T3_S57.failed_genomes.tsv 
 │       ├── gtdbtk.MEGAHIT-E069T3_S57.log
 │       └── gtdbtk.MEGAHIT-E069T3_S57.warnings.log
 └── gtdbtk_summary.tsv

Summary

The gtdbtk_summary.tsv file contains these columns (an example in brackets):

  • user_genome (MEGAHIT-E044T3_S56.11.fa)
  • classification (d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Agathobacter;s__Agathobacter rectalis)
  • fastani_reference (GCF_000020605.1)
  • fastani_reference_radius (95.0)
  • fastani_taxonomy (d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Agathobacter;s__Agathobacter rectalis)
  • fastani_ani (97.11)
  • fastani_af (0.88)
  • closest_placement_reference (GCF_000020605.1)
  • closest_placement_radius (95.0)
  • closest_placement_taxonomy (d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Agathobacter;s__Agathobacter rectalis)
  • closest_placement_ani (97.11)
  • closest_placement_af (0.88)
  • pplacer_taxonomy (d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Agathobacter;s__)
  • classification_method (taxonomic classification defined by topology and ANI)
  • note (topological placement and ANI have congruent species assignments)
  • other_related_references(genome_id,species_name,radius,ANI,AF) (GCA_900317585.1, s__Agathobacter sp900317585, 95.0, 94.71, 0.79; GCA_900546625.1, s__Agathobacter sp900546625, 95.0, 94.43, 0.84; GCA_900547695.1, s__Agathobacte> msa_percent 70.08)
  • msa_percent (70.08)
  • translation_table (11)
  • red_value ()
  • warnings (Genome has more than 13.3% of markers with multiple hits)