EVE Project Data - giffordlabcvr/Parvovirus-GLUE GitHub Wiki

Endogenous Parvoviral Element (EPV) Sequences

The sequences of EPVs were recovered from whole genome sequence (WGS) assemblies via database-integrated genome screening (DIGS) using the DIGS tool.

All data pertaining to this screen are included in this repository.

The complete list of vertebrate genomes screened can be found here.
The complete list of invertebrate genomes screened can be found here.
The set of parvovirus polypeptide sequences used as probes can be found here.
The final set of parvovirus and EPV polypeptide sequences used as references can be found here.
Input parameters for screening using the DIGS tool can be found here.

EVE Reference Sequences

We reconstructed reference sequences for EPVs using alignments of EPV sequences derived from the same initial germline colonisation event - i.e. orthologous elements in distinct species, and paralogous elements that have arisen via intragenomic duplication of EPV sequences.

Tabular data summarising EPV loci can be found at the following links:

Consensus/reference nucleotide sequences (FASTA format) for EPV loci can be found at the following links/directories:

EPV Nomenclature

We applied a systematic nomenclature to name endogenous parvoviral elements (EPVs), following a convention originally developed for endogenous retroviruses (ERVs). Each EPV was assigned a unique identifier (ID) composed of defined components:

EPV Nomenclature

Prefix: The classifier EPV (endogenous parvoviral element).
Taxon and Insertion ID: A composite of:
- (i) the name of the virus taxonomic group to which the EPV belongs.
- (ii) a numeric ID identifying a unique insertion locus. This number is shared by orthologous loci derived from the same germline integration event.
Host Range: A descriptor indicating the host lineage in which the EPV occurs. For species-specific insertions, this is the Latin binomial name; for orthologs present in multiple species, a broader taxonomic group name is used.

This nomenclature enables consistent identification and comparison of EPVs across species and research contexts.

Notes:

EPVs were assigned to virus taxonomic groups based on phylogenetic and genomic analyses. Where confident classification was not possible, the lowest applicable rank (e.g., subfamily Parvovirinae) was used.
Orthologous EPVs were grouped under the same numeric ID. However, some relationships may have been missed, and some groupings may inadvertently include distinct, paralogous loci.
When orthologs occur in multiple species, the host descriptor reflects the corresponding taxonomic group. If the group represents an unranked clade, we use the name of the closest named group at a lower rank and append the abbreviation UR (unranked) to indicate that no formal clade name fully captures the species set.

EPV and Parvovirus Alignments

The Parvovirus-GLUE-EVE project includes a comprehensive set of curated multiple sequence alignments to support the analysis of endogenous parvoviral elements (EPVs) in relation to modern parvoviruses. These alignments are organized into thematic subdirectories under the top-level alignments directory and serve different analytical purposes:

1. `export/eve-orthologs`

Alignments of orthologous EPV loci recovered from the genomes of related host species. These alignments allow comparative analysis of EPV insertions to:

Estimate minimum ages based on host divergence times.
Explore conservation of insertion sites.
Assess post-insertional mutation patterns.

Each file corresponds to a named EPV locus. Please note that flanking host sequences (where available) will have been removed.

2. `internal/genus`

Genus-level alignments of extant parvoviruses and associated EPV sequences. These files support reconstruction of ancient parvoviral genome features and taxonomic placement of EPVs.

Each genus directory (e.g., dependo, proto, ichthama) includes:

*-genome.aln.fna: Alignments of extant viral genomes.
*-genome+eves.aln.fna: Alignments combining viral genomes with associated EPVs.

3. `internal/cross-genus`

Alignments spanning multiple parvovirus genera, useful for investigating deep evolutionary relationships and potential recombination or gene transfer across subfamilies.

4. `internal/subfamily`

Higher-level alignments at the subfamily level, combining both EPVs and exogenous viruses to support broad phylogenetic analyses.

5. `tips`

Fine-grained alignments of individual EPV loci or small EPV groups, typically aligned to closely related parvovirus references. These are used to:

Examine local sequence context.
Evaluate classification accuracy.
Generate locus-specific phylogenies.

6. `root`

Alignments of conserved parvovirus regions across entire subfamilies, used for inferring relationships between genera. Versions including EPVs are available (*-root+eves.aln.fna) to place EPVs in a broader evolutionary framework.

7. `tree`

Partitioned alignments (e.g., Rep78, VP1) used for constructing phylogenetic trees of parvoviruses and EPVs. These alignments are the foundation for the evolutionary inferences made throughout the resource.

Phylogenies

We used GLUE to implement an automated process for deriving midpoint rooted, annotated trees from the EPV-containing alignments included in our project, to reconstruct the evolutionary relationships between EPVs and related viruses.

Trees were reconstructed at distinct taxonomic levels. The current set can be viewed using the links below:

Genus-level phylogenies
EPV lineage-level phylogenies

EVE Project Data - giffordlabcvr/Parvovirus-GLUE GitHub Wiki

Endogenous Parvoviral Element (EPV) Sequences

EVE Reference Sequences

EPV Nomenclature

EPV and Parvovirus Alignments

1. export/eve-orthologs

2. internal/genus

3. internal/cross-genus

4. internal/subfamily

5. tips

6. root

7. tree

Phylogenies

1. `export/eve-orthologs`

2. `internal/genus`

3. `internal/cross-genus`

4. `internal/subfamily`

5. `tips`

6. `root`

7. `tree`