Project Data - giffordlabcvr/Parvovirus-GLUE GitHub Wiki
Overview
The Parvovirus-GLUE core project provides a minimal set of essential data components for performing comparative genomic analysis of parvoviruses.
The data items included in the core project are:
-
A set of parvovirus genome feature definitions covering all genera.
-
An annotated reference genome sequence for each parvovirus genus (i.e. one species per genus).
-
A hierarchically arranged set of multiple sequence alignments representing sequence homology among parvovirus reference sequences.
Genome Features
Parvovirus-GLUE** defines a standard set of genome features for parvoviruses and defines the locations of these genome features on genus master reference sequences.
Parvoviruses have linear, single-stranded DNA genomes ~5 kilobases (kb) in length. They are typically very compact and generally exhibit the same basic genetic organisation comprising two major gene cassettes, one (Rep/NS) that encodes the non-structural proteins, and another (Cap/VP) that encodes the structural coat proteins of the virion.
A schematic representation of the canine parvovirus (CPV) genome. NS=non-structural; VP=capsid; PLA2=phospholipase A2; ITR=inverted terminal repeat; Kb=kilobases
Some species and genera encode additional polypeptide gene products adjacent to these genes or overlapping them in alternative reading frames.
The genome is flanked at the 3' and 5' ends by palindromic inverted terminal repeat (ITR) sequences that are the only cis elements required for replication.
Sequence Data
In the core project, a single 'master' reference sequence has been defined to represent each parvoviral genus recognized by the ICTV.
Tabular data summarising these reference sequences can be found here.
We explicitly defined the locations of genome features on master reference sequences (see here).
Subfamily Parvovirinae Master Reference Sequences
- Protoparvovirus: Carnivore protoparvovirus 1 (NC_001539)
- Aveparvovirus: Chicken parvovirus (NC_024452)
- Amdoparvovirus: Carnivore amdoparvovirus 1 (NC_001662)
- Erythroparvovirus: Human parvovirus B19 (NC_000883)
- Dependoparvovirus: Adeno-associated virus 2 (NC_001401)
- Copiparvovirus: Bovine parvovirus 2 (NC_006259) Bovine parvovirus 2
- Bocaparvovirus: Bovine parvovirus (NC_001540)
- Tetraparvovirus: Human parvovirus 4 (NC_007018)
- Artiparvovirus: Artibeus jamaicensis parvovirus 1 (NC_016752)
Subfamily Hamaparvovirinae Master Reference Sequences
- Chaphamaparvovirus: Porcine parvovirus 7 (NC_040562)
- Ichthamaparvovirus: Syngnathus scovelli chapparvovirus (MN049932)
- Brevidensoparvovirus: Aedes albopictus densovirus 2 (NC_004285)
- Penstyldensoparvovirus: Infectious hypodermal and hematopoietic necrosis virus (NC_002190)
Subfamily Densoparvovirinae Master Reference Sequences
- Ambidensovirus: Junonia coenia densovirus (NC_004284)
- Aquambidensovirus: Asteroid aquambidensovirus 1 (NC_038532)
- Blattambidensovirus: Blattodean blattambidensovirus 1 (NC_005041)
- Diciambidensovirus: Hemipteran diciambidensovirus 1 (NC_030296)
- Hemiambidensovirus: Dysaphis plantaginea densovirus (NC_034532)
- Iteradensoparvovirus: Lepidopteran iteradensovirus 1 (NC_003346)
- Miniambidensovirus: Orthopteran miniambidensovirus 1 (NC_022564)
- Muscambidensovirus: Haematobia irritans densovirus (MK643151)
- Pefuambidensovirus: Blattodean pefuambidensovirus 1 (NC_000936)
- Protoambidensovirus: Dipteran protoambidensovirus 1 (MK722617)
- Scindoambidensovirus: Hemipteran scindoambidensovirus 1 (NC_004289)
Multiple Sequence Data
Alignments Included in Parvovirus-GLUE
Multiple sequence alignments (MSAs) in the Parvovirus-GLUE core project include:
- A root alignment constructed to represent homology between the two largest subgroupings in the Parvoviridae.
- subfamily alignments constructed to represent proposed homologies between representative members of Parvoviridae subfamilies.
- cross-genus alignments constructed to represent proposed homologies between representative members of 'minor' Parvoviridae lineages.
Please note that the repository also contains genus-level alignments constructed to represent proposed homologies between the genomes of representative members of specific parvovirus genera. These are imported when genus-level extensions are added to the core project.
The Alignment Tree in Parvovirus-GLUE
Parvovirus-GLUE makes use of GLUE's constrained alignment tree data structure.
For the highest taxonomic levels (i.e. at the root) we aligned only the most conserved regions of the NS gene, whereas for the lower taxonomic levels (i.e. within and below genus level) we aligned complete genomes.
The root alignment contains reference sequences for major clades, whereas all children of the root inherit at least one reference from their immediate parent. Thus, all alignments are linked to one another via our chosen set of master reference sequences.
The Alignment Tree in Parvovirus-GLUE The schematic figure above shows the 'alignment tree' data structure currently implemented in Parvovirus-GLUE. For the highest taxonomic levels (i.e. at the root) we aligned only the most conserved regions of the genome, whereas for the lower taxonomic levels (i.e. within and below genus level) we aligned complete coding sequences. We used an alignment tree data structure to link these alignments, via a set of common reference sequences. The root alignment contains reference sequences for major clades, whereas all children of the root inherit at least one reference from their immediate parent. Thus, all alignments are linked to one another via our chosen set of master reference sequences.