Taxonomy Database - mariehoffmann/isPCR GitHub Wiki

database schema

Schema

Node

extracted from nodes.dmp

Attributes Constraint Comment
tax_id PRIMARY KEY node id in GenBank taxonomy database
parent_tax_id parent node id in GenBank taxonomy database
rank rank of this node (superkingdom, kingdom, ...)

Names

All known names assigned to a tax_id are listed in the table. If there exists multiple names for a tax_id, there will be as many entries with the same tax_id. Rows are extracted from names.dmp.

Attributes Constraint Comment
tax_id PRIMARY KEY, FOREIGN KEY the id of node associated with this name
name_txt PRIMARY KEY name itself
unique_name the unique variant of this name if name not unique

Lineage

For each taxonomic node identified by its unique tax_id, a list of ancestors is stored from most far to closest one. Data is extracted per default from NCBI's taxidlineage.dmp.

Attributes Constraint Comment
tax_id PRIMARY KEY, FOREIGN KEY
lineage list of tax_ids

Accessions

This table contains a pre-processed tax_id to accession number resolution, such that we are able to retrieve all existing accessions given a tax_id. Data is extracted per default from NCBI's nt(.fast) file and Node and Names tables.

Attributes Constraint Comment
tax_id PRIMARY KEY, FOREIGN KEY only unique in combination with accession
accession PRIMARY KEY