eHBV Project Schema Extensions - giffordlabcvr/Hepadnaviridae-GLUE GitHub Wiki

Hepadnavirus-GLUE-EVE extends Hepadnavirus-GLUE's schema with custom tables for capturing EVE-specific data.

Schema extensions are defined in this project build file.

The project-specific extensions comprise two custom tables:

  1. locus_data: contains EVE locus information: e.g. species, assembly, scaffold, location coordinates.

  2. refcon_data: contains summary information for individual EVE insertions. It refers to the reference sequences constructed to represent each insertion, which reflect our best efforts to reconstruct progenitor virus sequences as they might have looked when they initially integrated into the germline of ancestral species.

Both these custom tables are linked to the main sequence table via the 'sequenceID' field.

Extensions to sequence Table


The sequence table of GLUE's core schema was extended to include the following additional fields:

Parameter Type Definition
refcon_data LINK Link to the refcon_data table containing summary information about individual eHBV insertions
locus_data LINK Link to the locus_data table containing eHBV locus-specific information

Fields included in refcon_data Table


A custom table was defined to capture eHBV reference and consensus sequence-associated information, as follows:

Parameter Type Definition
reftype VARCHAR Type of reference (e.g., consensus or reference sequence)
host_group_taxlevel VARCHAR Taxonomic level of the host group (e.g., genus, species)
host_group_name VARCHAR Scientific name of the host group
num_copies INTEGER Number of endogenous viral element copies
locus_numeric_id INTEGER Numeric identifier for the locus
nearest_upstream_orf VARCHAR Nearest upstream open reading frame (ORF)
nearest_downstream_orf VARCHAR Nearest downstream open reading frame (ORF)

Fields included in locus_data Table


A custom table was defined to capture eHBV locus-associated information, as follows:

Parameter Type Definition
locus_numeric_id INTEGER Numeric identifier for the locus
scaffold VARCHAR Scaffold or chromosome on which the locus resides
start_position INTEGER Start position of the locus on the scaffold
end_position INTEGER End position of the locus on the scaffold
orientation VARCHAR Orientation of the locus (plus or minus strand)
host_sci_name VARCHAR Scientific name of the host organism
bitscore VARCHAR Bitscore from sequence alignment of the locus
identity VARCHAR Sequence identity percentage from alignment
sequence_length INTEGER Length of the locus sequence in nucleotides
assigned_name VARCHAR Name assigned to the locus
host_species VARCHAR Species of the host organism
host_superorder VARCHAR Taxonomic superorder of the host
host_class VARCHAR Taxonomic class of the host
host_order VARCHAR Taxonomic order of the host
host_family VARCHAR Taxonomic family of the host
host_genus VARCHAR Taxonomic genus of the host