Virome - serratus-bio/open-virome GitHub Wiki

{Virome}

In the Open Virome a {Virome} as the collection of all viruses associated with a set of sequencing datasets (called runs).

Virome Icon

Any property unifying a set of runs constitutes that specific <property> virome , and runs can belong to multiple viromes, for example

(1) All runs which are labelled as Eimeria sp. make the Eimeria Virome

(2) All runs originating from Lake Garibaldi make the Lake Garibaldi Virome

(3) All runs taken from freshwater lakes make the Freshwater Lake Virome

One run may be a member of (1), (2), (3), it is part of the intersection of these three viromes.

{Virome} Graph

The {Virome} is represented as a weighted undirected, bipartite graph where:

  • Virus Node (hexagon): an abstract unit of virus, defined here as species-like Operational Taxonomic Units (sOTU) of RNA viruse (See: palmDB)
  • Run Node (circle): sequencing runs from the Sequence Read Archive
  • Edge (solid line): a contig within the run, with an sOTU identified on it
  • Edge (weight): line thickness is scaled by contig 'read coverage'/expression

Virome Example

The label on the Virome Graph is the GenBank "taxonomic species" when the sOTU (u26089) is aligned to GenBank nr. Here,u26089 aligns to Eimeria stiedai RNA virus 1, YP_009551684.1 with 100% amino acid identity.

image

Virome Component

Each {Virome} can be divided into connected components, which are communities of virus/run nodes, joined by at least one edge.

Component

Component Figures: {Eimeria Virome} example

The 45 run nodes contain 30 virus nodes with 107 detection edges. These can be grouped into 6 components shown below as disjoint graphs.

Eimeria Virome

Consider an example with 10 nodes, you can have either (left, virus-rich) 8 viruses in 2 runs, or (right, run-rich) 2 viruses in 8 runs with varying relationship density (contig, edges) and expression (coverage, edge-weight). Component figures summarize these relationships.

10-node examples

The component count figure show the number of nodes (distinct virus + run) and edges (contigs) per components which shows how internally interconnected each component is.

The component degree figure shows the average number of viruses per run or runs per virus for each component. A component is virus rich when sOTU degree < run degree or run rich when sOTU degree > run degree. In the Eimeria Virome, all the components are run rich, meaning that on average each virus is represented by multiple runs.

Emieria Virome Component Figure

Quantifying Virus-Virome Specificity

Virome Enrichment (Vrich)

For each sOTU the Virome Enrichment (Vrich) is the fraction of all sOTU observations contained in the {Virome}. Ranges in values from 0.0 - 1.0.

Vrich = 

[ Number of times Virus occurs in {Virome} ]  / [ Number of times Virus occurs in all Datasets ]

The size of the virus nodes are scaled by Vrich, for example the 5 / 6 observations of Eimeria stiedai RNA virus and 2 / 1250 observations Red Mite associated Cystovirus yield Vrich values of 0.833 and 0.0016, respectively.

Vrich scale

Virome Exact Score (Vexact)

The Vexact score is the -log10( p.value ) of a Fisher's Exact Test with Bonferroni multiple-testing correction, scaled to [0.1 , 10]

{Virome}   is the sum observation of all i sOTU in given virome
{Serratus} is the sum observation of all sOTU across all Serratus
    
N       : Count of all sOTU observations across {Virome}
n_vir   : The observed ith sOTU count in {Virome} 
n_out   : The observed ith sOTU count in {Serratus}, outside {V}
n_total : The total count of ith sOTU observations
    
M       : Count of all sOTU observations across {Serratus}
m_vir   : The sum of all non-ith sOTU in {Virome}
m_out   : The sum of all non-{V} sOTU in {Serratus}
m_total : The total count of all non-ith sOTU observations

# Fisher's Exact Test
FT <- fisher.test( rbind(  c( n_vir, m_vir ),
                           c( n_out, m_out )) ,
                   alternative = 'greater' )

# Virome Exact Score
v.exact <- -log10( min(1 ,  FT$p.value * n.tests) )

  # IF Virome Exact is >10 or INF, set to 10
      if ( v.exact > 10 ){
        v.exact <- 10
      }
      
  # IF Virome Exact is == 0, set it to 0.1
      if ( v.exact == 0){
        v.exact <- 0.1
      }

Virome Rank (Vrank)

The Virome Rank or Vrank is a heuristic score combining a measurement of the centrality of a virus within a Virome using the Google PageRank algorithm. This can be thought of as a way of identifying viruses which are "core" or abundant within the virome, supported by multiple datasets.

Eimeria PageRank

Eimeria PageRank

To calculate ViromeRank

ViromeRank_sOTU = PageRank_sOTU * ViromeEnrichment_sOTU * VExact_sOTU
Eimeria ViromeRank

Eimeria ViromeRank

And the relationship between the two via Vrich. Note, while the u653854 Rabbit hemorrhagic disease virus node is central, it is not virome-specific and thus down-weighted in importance by VRank.

Virome Rank Graph

⚠️ **GitHub.com Fallback** ⚠️