EVE Project Data - giffordlabcvr/Hepadnaviridae-GLUE GitHub Wiki
Sequence Data
These are the raw data generated by database-integrated genome screening (DIGS). The tabular file contains information about the genomic location of each EVE. EVEs were classified by comparison to a reference library of polypeptide sequences designed to represent the known diversity of hepadnaviruses - this includes extinct lineages represented only by endogenous viral elements (EVEs).
These data were obtained via DIGS performed in vertebrate genome assemblies downloaded from NCBI Genome (2020-07-15).
Raw data about the EVEs in tabular format can be found here.
Nucleotide level data in FASTA format (individual files) can be found here.
Reference Sequence Data
We constructed consensus sequences for hepadnaviral paleoviruses by aligning eHBV sequences derived from the same initial germline colonisation event—i.e. orthologs in distinct species, and paralogs that have arisen via intragenomic duplication.
Reference sequence data in tabular format are here.
The reference sequences in FASTA format are here.
Multiple Sequence Alignments
The Hepadnavirus-GLUE project contains multiple sequence alignments linking all known eHBV and virus sequences.
Exported alignments can be accessed at the links below.
Nucleotide level data:
Taxonomic group | Full-length eHBV | Core codons | Surface codons | Pol codons |
---|---|---|---|---|
Avihepadnavirus | FASTA MSA | FASTA MSA | FASTA MSA | FASTA MSA |
Herpetohepadnavirus | FASTA MSA | FASTA MSA | FASTA MSA | FASTA MSA |
Metahepadnavirus | FASTA MSA | FASTA MSA | FASTA MSA | FASTA MSA |
Protein level data:
Taxonomic group | Core AA | Surface AA | Pol AA |
---|---|---|---|
Avihepadnavirus | FASTA MSA | FASTA MSA | FASTA MSA |
Herpetohepadnavirus | FASTA MSA | FASTA MSA | FASTA MSA |
Metahepadnavirus | FASTA MSA | FASTA MSA | FASTA MSA |
EVE Nomenclature
Nomenclature for eHBVs
We use a systematic naming convention for endogenous hepadnaviruses (eHBVs), adapted from a framework established for endogenous retroviruses. Each eHBV locus is assigned a unique identifier (ID) that reflects key properties of the insertion.
The ID consists of three components:
- Classifier: The prefix 'eHBV' (endogenous hepatitis B virus/endogenous hepadnavirus).
- Virus Group & Locus ID: A composite of:
- The name of the hepadnavirus taxonomic group from which the element derives.
- A numeric code uniquely identifying the insertion locus.
- Host Species Set: A designation indicating the species in which the orthologous locus is present---or was present before deletion.
This standardized approach ensures clarity and consistency in referencing eHBV loci across different hosts and taxonomic contexts.