eHBV Project Schema Extensions - giffordlabcvr/Hepadnaviridae-GLUE GitHub Wiki
Hepadnavirus-GLUE-EVE extends Hepadnavirus-GLUE's schema with custom tables for capturing EVE-specific data.
Schema extensions are defined in this project build file.
The project-specific extensions comprise two custom tables:
-
locus_data
: contains EVE locus information: e.g. species, assembly, scaffold, location coordinates. -
refcon_data
: contains summary information for individual EVE insertions. It refers to the reference sequences constructed to represent each insertion, which reflect our best efforts to reconstruct progenitor virus sequences as they might have looked when they initially integrated into the germline of ancestral species.
Both these custom tables are linked to the main sequence
table via the 'sequenceID' field.
sequence
Table
Extensions to The sequence
table of GLUE's core schema was extended to include the following additional fields:
Parameter | Type | Definition |
---|---|---|
refcon_data | LINK | Link to the refcon_data table containing summary information about individual eHBV insertions |
locus_data | LINK | Link to the locus_data table containing eHBV locus-specific information |
refcon_data
Table
Fields included in A custom table was defined to capture eHBV reference and consensus sequence-associated information, as follows:
Parameter | Type | Definition |
---|---|---|
reftype | VARCHAR | Type of reference (e.g., consensus or reference sequence) |
host_group_taxlevel | VARCHAR | Taxonomic level of the host group (e.g., genus, species) |
host_group_name | VARCHAR | Scientific name of the host group |
num_copies | INTEGER | Number of endogenous viral element copies |
locus_numeric_id | INTEGER | Numeric identifier for the locus |
nearest_upstream_orf | VARCHAR | Nearest upstream open reading frame (ORF) |
nearest_downstream_orf | VARCHAR | Nearest downstream open reading frame (ORF) |
locus_data
Table
Fields included in A custom table was defined to capture eHBV locus-associated information, as follows:
Parameter | Type | Definition |
---|---|---|
locus_numeric_id | INTEGER | Numeric identifier for the locus |
scaffold | VARCHAR | Scaffold or chromosome on which the locus resides |
start_position | INTEGER | Start position of the locus on the scaffold |
end_position | INTEGER | End position of the locus on the scaffold |
orientation | VARCHAR | Orientation of the locus (plus or minus strand) |
host_sci_name | VARCHAR | Scientific name of the host organism |
bitscore | VARCHAR | Bitscore from sequence alignment of the locus |
identity | VARCHAR | Sequence identity percentage from alignment |
sequence_length | INTEGER | Length of the locus sequence in nucleotides |
assigned_name | VARCHAR | Name assigned to the locus |
host_species | VARCHAR | Species of the host organism |
host_superorder | VARCHAR | Taxonomic superorder of the host |
host_class | VARCHAR | Taxonomic class of the host |
host_order | VARCHAR | Taxonomic order of the host |
host_family | VARCHAR | Taxonomic family of the host |
host_genus | VARCHAR | Taxonomic genus of the host |