17. Principal Coordinates Analysis PCoA - raytonghk/genepiper GitHub Wiki

Principal Coordinates Analysis (PCoA), also known as Multidimensional scaling (MDS), is an exploratory multivariate method commonly used for microbiome data. It is a conceptual extension of the PCA technique. It similarly seeks to order the objects along the axes of principal coordinates while attempting to explain the variance in the original data set. However, while PCA organizes objects by an eigen analysis of a correlation or covariance matrix, PCoA can be applied to any distance (dissimilarity) matrix (Gower 1966). PCoA has gained recent popularity in microbial ecology due to its ability to use phylogenetic distance (e.g. UniFrac distance; Lozupone & Knight 2005) and community composition (e.g. Bray–Curtis distance; Bray & Curtis 1957) measures to calculate (dis)similarity among microbial populations. Because PCoA uses distance matrix as its input, it is not possible to directly relate any of the measured variables to individual principal coordinate axes (Ramette 2007). An indirect correlation or regression analysis of object PC values vs. object scores for a particular variable can instead be used to estimate the contribution of that variable to object dispersion along a particular PC axis (Koenig et al. 2011).
Load Data and Subsample
Analysis starts from loading the data in the Load Data panel. User selects the project and the data label to load the saved data. After loading data, subsampling can be done in the Filter panel if needed. Read our tutorial 07. Subsetting Data about the usage of the Filter panel.

Parameters
Taxonomic Rank For Agglomeration: Users may specify the taxonomic rank for the analysis here. The naming of the taxa point will follow the taxonomical rank selected.
Abundance Type: choose between Raw Count (original read counts), Rarefied Count (rarefying sample read counts to the lowest count amongst samples) and Relative Abundance (counts are divided by the sum of counts in each sample).
Distance Method: select a distance/dissimilarity measure to construct a distance matrix for the analysis:
- UniFrac (unweighted) - requires phylogenetic tree
- Weighted UniFrac - requires phylogenetic tree
- Bray-Curtis - commonly used for biological data
- Gower
- Jaccard
- Kulczynski
- Horn-Morisita
- Bionomial
- Cao
- Chao
see vegdist R documentation Dissimilarity Indices For Community Ecologists & Lozupone & Knight (2005) for details.

Results
Distance Matrix tab provides the distance matrix for the analysis, and can be downloaded as a tab-delimited table by clicking the Download Distance Matrix button.

Output tab provides a Download PCoA button to download the resulting RDS object.

Eigen Values tab provides the table of the eigen values, and the table can be downloaded as a tab-delimited table by clicking the Download button.

Sample tab provides the table of the ordination scores of samples, and the table can be downloaded as a tab-delimited table by clicking the Download button.

Permanova tab provides additional statistical analysis, Permutational Multivariate Analysis of Variance (permanova), to test for the significance of the different grouping. Significance levels (P values) are obtained through permutation.
-
Group Column: select a character column from the sample data table for the grouping of samples.
-
Distance Method: select a distance/dissimilarity measure for the Permanova analysis.

2D Plot tab provides the ordination plot of PCoA. User may download the plot by clicking the Download button, with options to specify the file name and dimension of the figure.

3D Plot tab provides a 3-Dimensional plot of the PCoA. Users may explore the ordination space interactively by clicking and dragging in the ordination space. Mouse over onto the data points to display the x, y, z coordinates. There are assistant tools located at the top right corner of the 3D plot. Users may download the plot as a png figure, zoom, pan, orbital rotate, turntable rotate, or reset view with these tools.

Graphic Parameters
Graphic Parameters panel provides different options for the 2D Plot and 3D Plot when the corresponding tab of the result panel is selected.
With 2D Plot tab selected:

Sample tab provides the options for the plotting and labelling of the sample points (as dots). User may select the symbol and label size, which applies instantly in the ordination.

Plot Axis tab provides options to select the plot axis. User have to specify exactly two axes for the 2D plot.

Group tab provides options to display additional environmental features for the sample points. User may select a variable from the sample data table in the Group Column pull-down menu, which instantly assign colour to the sample points and labels in the ordination. There are more options for adding information about the classification or grouping of sample points with Convex Hull, Spider and Ellipse functions that overlay in the ordination. The Convex Hull of a set of points is defined as the smallest convex polygon, that encloses all of the points in the set. Convex means that the polygon has no corner that bends inwards. Spider plot connects all points to their centroid as in a spider web. The variable for the grouping is shown in the centroid of each Spider web. User may customise the spider line width and label size. Ellipse adds ellipses of standard deviation or standard error areas at a user-specified significant level. User may also customise the ellipse line width.

Envfit tab provides another option to overlay environmental information onto the ordination, known as the indirect gradient analysis, see our tutorial that provides an overview of the multivariate and ordination analyses. Any environmental features (variables/columns) in the sample data table could be fit into the coordinate of the selected axes via envfit function from the vegan package. The summary of the envfit results will be shown. This text summary results can be downloaded by clicking Download Envfit. User may select the Plot Envfit? option to add the fitted features in the plot. Users should select the features of interest in the Factor tab. For numeric features the fitted vectors are shown as arrows. The arrow points to the direction of most rapid change in the environmental variable. Often this is called the direction of the gradient. The length of the arrow is proportional to the correlation between ordination and environmental variable. Often this is called the strength of the gradient. For categorical features (characters variable), the centroid location of the fitted vectors will be plotted as dots. User may customise the dot and label size, and the length width of the arrows.

User may further customise the plot in the Title ,Axis and Legend tabs before exporting the figure.
With 3D Plot tab selected:
Sample tab provides the options for the plotting and labelling of the sample points (as dots), which applies instantly in the ordination.

Plot Axis tab provides options to select the plot axis. User have to specify exactly three axes for the 3D plot.

Group provides options to display additional environmental features for the sample points. User may select a variable from the sample data table in the Group Column pull-down menu, which instantly assign colour to the sample points and labels in the ordination.

References
- Bray JR & Curtis JT (1957) An Ordination of the Upland Forest Communities of Southern Wisconsin. Ecological Monographs, 27, 326–349.
- Gower JC 1966 Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika, 53, 325–338.
- Koenig JE, Spor A, Scalfone N et al. (2011) Succession of microbial consortia in the developing infant gut microbiome. Proceedings of the National Academy of Sciences, USA, 108 (Suppl 1), 4578–4585.
- Legendre P & Legendre LFJ (2012) Numerical ecology. Vol. 24. Elsevier.
- Lozupone C & Knight R (2005) UniFrac: a new phylogenetic method for comparing microbial communities. Appl. Environ. Microbiol.71(12):8228-35.
- Paliy O & Shankar V. (2016) Application of multivariate statistical techniques in microbial ecology. Molecular ecology. 25(5):1032-57.
- Ramette A (2007) Multivariate analyses in microbial ecology. FEMS microbiology ecology. 62(2):142-60.