nfcoremag taxonomy - quadram-institute-bioscience/gmh-sops GitHub Wiki
nf-core/MAG taxonomy
Files
The Taxonomy output is a directory with the taxonomy classification of the bins. In this example three metagenomes have been provided to be assembled and binned.
sub-mag/Taxonomy/GTDB-Tk/
├── MEGAHIT
│ ├── E026BW3_S18
│ │ ├── gtdbtk.MEGAHIT-E026BW3_S18.ar122.markers_summary.tsv
│ │ ├── gtdbtk.MEGAHIT-E026BW3_S18.bac120.classify.tree.gz
│ │ ├── gtdbtk.MEGAHIT-E026BW3_S18.bac120.filtered.tsv
│ │ ├── gtdbtk.MEGAHIT-E026BW3_S18.bac120.markers_summary.tsv
│ │ ├── gtdbtk.MEGAHIT-E026BW3_S18.bac120.msa.fasta.gz
│ │ ├── gtdbtk.MEGAHIT-E026BW3_S18.bac120.summary.tsv
│ │ ├── gtdbtk.MEGAHIT-E026BW3_S18.bac120.user_msa.fasta
│ │ ├── gtdbtk.MEGAHIT-E026BW3_S18.failed_genomes.tsv
│ │ ├── gtdbtk.MEGAHIT-E026BW3_S18.log
│ │ └── gtdbtk.MEGAHIT-E026BW3_S18.warnings.log
│ ├── E044T3_S56
│ │ ├── gtdbtk.MEGAHIT-E044T3_S56.ar122.markers_summary.tsv
│ │ ├── gtdbtk.MEGAHIT-E044T3_S56.bac120.classify.tree.gz
│ │ ├── gtdbtk.MEGAHIT-E044T3_S56.bac120.filtered.tsv
│ │ ├── gtdbtk.MEGAHIT-E044T3_S56.bac120.markers_summary.tsv
│ │ ├── gtdbtk.MEGAHIT-E044T3_S56.bac120.msa.fasta.gz
│ │ ├── gtdbtk.MEGAHIT-E044T3_S56.bac120.summary.tsv
│ │ ├── gtdbtk.MEGAHIT-E044T3_S56.bac120.user_msa.fasta
│ │ ├── gtdbtk.MEGAHIT-E044T3_S56.failed_genomes.tsv
│ │ ├── gtdbtk.MEGAHIT-E044T3_S56.log
│ │ └── gtdbtk.MEGAHIT-E044T3_S56.warnings.log
│ └── E069T3_S57
│ ├── gtdbtk.MEGAHIT-E069T3_S57.ar122.markers_summary.tsv
│ ├── gtdbtk.MEGAHIT-E069T3_S57.bac120.classify.tree.gz
│ ├── gtdbtk.MEGAHIT-E069T3_S57.bac120.filtered.tsv
│ ├── gtdbtk.MEGAHIT-E069T3_S57.bac120.markers_summary.tsv
│ ├── gtdbtk.MEGAHIT-E069T3_S57.bac120.msa.fasta.gz
│ ├── gtdbtk.MEGAHIT-E069T3_S57.bac120.summary.tsv
│ ├── gtdbtk.MEGAHIT-E069T3_S57.bac120.user_msa.fasta
│ ├── gtdbtk.MEGAHIT-E069T3_S57.failed_genomes.tsv
│ ├── gtdbtk.MEGAHIT-E069T3_S57.log
│ └── gtdbtk.MEGAHIT-E069T3_S57.warnings.log
└── gtdbtk_summary.tsv
Summary
The gtdbtk_summary.tsv
file contains these columns (an example in brackets):
- user_genome (MEGAHIT-E044T3_S56.11.fa)
- classification (d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Agathobacter;s__Agathobacter rectalis)
- fastani_reference (GCF_000020605.1)
- fastani_reference_radius (95.0)
- fastani_taxonomy (d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Agathobacter;s__Agathobacter rectalis)
- fastani_ani (97.11)
- fastani_af (0.88)
- closest_placement_reference (GCF_000020605.1)
- closest_placement_radius (95.0)
- closest_placement_taxonomy (d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Agathobacter;s__Agathobacter rectalis)
- closest_placement_ani (97.11)
- closest_placement_af (0.88)
- pplacer_taxonomy (d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Agathobacter;s__)
- classification_method (taxonomic classification defined by topology and ANI)
- note (topological placement and ANI have congruent species assignments)
- other_related_references(genome_id,species_name,radius,ANI,AF) (GCA_900317585.1, s__Agathobacter sp900317585, 95.0, 94.71, 0.79; GCA_900546625.1, s__Agathobacter sp900546625, 95.0, 94.43, 0.84; GCA_900547695.1, s__Agathobacte> msa_percent 70.08)
- msa_percent (70.08)
- translation_table (11)
- red_value ()
- warnings (Genome has more than 13.3% of markers with multiple hits)