ANOSIM - Statistics-and-Machine-Learning-with-R/Statistical-Methods-and-Machine-Learning-in-R GitHub Wiki
Analysis of Similarities
Click for R-Script
ANOSIM tests whether distances between groups are greater than within groups. Given a matrix of rank dissimilarities between a set of samples, each solely belongs to one treatment group, the ANOSIM tests whether we can reject the null hypothesis that the similarity between groups is greater than or equal to the similarity within the groups.
Null hypothesis: There is no difference between the means of two or more groups of (ranked) dissimilarities.
The ANOSIM test is similar to an ANOVA hypothesis test, but it uses a dissimilarity matrix as input instead of raw data. It is also non-parametric, meaning it doesn’t assume much about your data (like a normal distribution, etc), so it’s a good bet for often-skewed microbial abundance data.
The ANOSIM R
The ANOSIM statistic compares the mean of ranked dissimilarities between groups to the mean of ranked dissimilarities within groups. An R-value close to "1.0" suggests dissimilarity between groups while an R-value close to "0" suggests an even distribution of high and low ranks within and between groups. R values below "0" suggest that dissimilarities are greater within groups than between groups. See Clarke and Gorley (2001) for a guide to interpreting ANOSIM R values.
Calculation of R:
Key Assumption: The ranges of (ranked) dissimilarities within groups are equal, or at least very similar.
- Calculate dissimilarity matrix
- Calculate rank dissimilarities (smallest dissimilarity is given a rank of 1)
- Calculate mean among- and within-group rank dissimilarities.
- Calculate test statistic R (an index of relative within-group dissimilarity)
- R = 1 when all pairs of samples within groups are more similar than to any pair of samples from different groups
- R = 0 expected value under the null the model that among-and within-group dissimilarities are the same on average
- R < 0 numerically possible but ecologically unlikely
Determine the probability of an R large enough through Monte Carlo permutations.
- Permutations involve randomly assigning sample observations to groups
- The significance test is simply the fraction of permuted R’s that are greater than the observed R
Points to consider :
- Do not assign group membership based on the results of clustering (or a similar exploratory method) applied to the same data set and then treat a significant ANOSIM result as meaningful. This is an example of data dredging.
- Running ANOSIM on groups with very different dispersions can lead to unreliable results. Groups with very different dispersions may produce high R values, even if there's no real difference in their centroids. If differences in group dispersion are as meaningful to your analysis as differences in group center, this may not be an issue
- Criticism of this and other (dis)similarity-based methods should be considered (e.g. Warton et al., 2012).
Source: https://mb3is.megx.net/gustame/hypothesis-tests/anosim