Paleovirus Nomenclature - giffordlabcvr/Hepadnaviridae-GLUE GitHub Wiki
Nomenclature for eHBVs
We have applied a systematic approach to naming endogenous hepadnaviruses (eHBVs), following a convention developed for endogenous retroviruses. Each eHBV locus was assigned a unique identifier (ID) constructed from several components, each of which refers to a property of the locus.
The first component is the classifier ‘eHBV’ (endogenous hepatitis B virus/endogenous hepadnavirus).
The second component is a composite of two distinct subcomponents separated by a period: (i) the name of eHBV group; (ii) a numeric ID that uniquely identifies the insertion. The numeric ID is an integer that identifies a unique insertion locus that arose as a consequence of an initial germline infection. Thus, orthologous copies in different species are given the same number.
Where an EVE sequence is thought to have been duplicated within the germline following it's initial incorporation (e.g. via segmental duplication or transposition) we have appended an additional 'duplicate id' to the numeric ID, separated by a period. Please note that we have not yet resolved the orthologous relationships among sets of eHBV sequences belonging to multicopy eHBV lineages. We have therefore assigned unique duplicate IDs to each sequence within these lineages.
The third component of the ID defines the set of host species in which the ortholog occurs, or did occur prior to being deleted.