VirusDiversity - BGIGPD/BestPractices4Pathogenomics GitHub Wiki
Rapid introduction
Dr. REN and Dr. Wang recently published a comprehensive study that explores the substantial viral diversity in bats and rodents from East Africa (Wang et al.,2024), providing insights into the evolution, recombination, and cocirculation of viruses within these animal populations.
Fig.1a from Wang, D., Yang, X., Ren, Z. et al. Substantial viral diversity in bats and rodents from East Africa: insights into evolution, recombination, and cocirculation. Microbiome 12, 72 (2024). https://doi.org/10.1186/s40168-024-01782-4
It uncovered substantial diversity, with a high degree of geographic specificity and host-related variability among the viruses in East Africa.
While this article went deep into the genomic diversity analysis, it barely addressed the ecological diversity of viral populations. One reason for this is that this study primarily concerned vertebrate viruses, which are relatively scarce in terms of species within a single host, leading to a high sparsity in the overall species abundance matrix.
What will we learn
- Explore metadata from a publication
- Understand Virus coverage and abundance profile
- Understand and calculate the ecological diversities
Technical workflow
How we obtained the virus identification as well as quantification profile?
Here is a graphic summary of the technical workflow:
Fig.S2 Workflow for virus sequence identification
From Wang et al.,2024
Ecological diversity
Diversity in ecology refers to the variety of species within a particular ecosystem or habitat. It is a fundamental concept in ecology and can be measured at different levels (See sections below).
By the way, following diversity indexs will be calculated by vegan
package, a comprehensive collection of functions dedicated to community ecology, particularly for the analysis of ecological community composition data.
Alpha Diversity (α-diversity)
Alpha diversity refers to the diversity within a specific community or ecosystem. It is a local measure that includes both species richness (the number of different species) and species evenness (the distribution of individuals among those species). Common metrics for alpha diversity include the Shannon-Wiener index and the Simpson index.
Shannon Index (H')
The Shannon index is a widely used measure of diversity that takes into account both species richness and evenness. It is based on information theory and is sensitive to the presence of rare species.
Formula:
$$ H' = -\sum_{i=1}^{s} p_i \ln(p_i) $$
where $s$ is the number of species, and $p_i$ is the proportion of individuals in the $i$-th species.
- High values indicate high diversity.
- It is sensitive to evenness, so a community with many rare species will have a higher Shannon index than one with a few dominant species.
- The logarithm base can vary, but natural logarithm (base $e$) is commonly used.
Simpson Index (D)
The Simpson index measures the probability that two individuals randomly selected from the community will belong to the same species. It is sensitive to the abundance of the most common species.
Formula:
$$ D = \sum_{i=1}^{s} p_i^2 $$
where $p_i$ is the proportion of individuals in the $i$-th species.
- Values range from 0 (no diversity) to 1 (no richness).
- A high value indicates low diversity, as it suggests a high probability of drawing two individuals from the same species.
- It is sensitive to the dominance of a few species, so communities with a few dominant species will have a higher Simpson index.
Inverse Simpson Index (1/D or Q)
The inverse Simpson index, also known as the Gini-Simpson index, is the reciprocal of the Simpson index. It is used to express diversity in a way that is more intuitive, with higher values indicating greater diversity.
Formula:
$$ Q = \frac{1}{D} = \sum_{i=1}^{s} \frac{1}{p_i} $$
where $p_i$ is the proportion of individuals in the $i$-th species.
- Values range from 1 (no diversity) to infinity (maximum diversity).
- It is the inverse of the Simpson index, so a higher value indicates greater diversity.
- It is also sensitive to the dominance of a few species, but in the opposite way of the Simpson index.
These indices provide different perspectives on diversity and can be influenced by different aspects of community composition. The choice of which index to use depends on the specific characteristics of the community being studied and the ecological questions being asked.
Richness
Richness is a component of alpha diversity that measures the number of different species present in a community or sample. It is a simple count and does not take into account the relative abundance of each species. Richness is often used as a basic descriptor of biodiversity.
Evenness
Evenness is another component of alpha diversity that describes how evenly the individuals are distributed among the different species in a community. A community with high evenness has a more equal distribution of individuals among species, whereas a community with low evenness has a few dominant species and many rare species. Evenness can be calculated using the formula:
$$ E = (H' / Hmax) $$
where H' is the diversity index (like the Shannon index) and Hmax is the maximum possible value of H'.
Beta Diversity (β-diversity).
Beta diversity, also known as true diversity or gamma-alpha difference, measures the diversity between communities or ecosystems. It quantifies the difference in species composition or the turnover of species as you move from one community to another.
Beta diversity can be thought of as the rate at which species change from one area to another. It is often calculated as the difference between gamma diversity (the total species diversity in a landscape) and alpha diversity.
Beta diversity reflects the similarity of viral taxonomic diversity among different samples/ regions, with smaller distance values indicating a higher degree of similarity. Therefore, it is essentially a distance matrix. For details on different distance algorithms, you can find their formulas by typing ?vegdist
in the console.
Task
Please open the Virus_diversity/Codenote.Rmd to execute the process to calculate diversity and visualization.