19. Redundancy Analysis RDA - raytonghk/genepiper GitHub Wiki

Redundancy Analysis (RDA) is an interpretative multivariate method, also regarded as a constrained ordination, that assesses how much of the variation in one set of variables can be explained by the variation in another set of variables. It is the multivariate extension of simple linear regression that is applied to sets of variables (Rao 1964; Van den Wollenberg 1977). RDA is based on similar principles as PCA, and thus assumes linear relationships among variables. RDA is in fact a canonical version of PCA where the principal components are constrained to be linear combinations of the explanatory variables. If the expected relationship between response variables and environmental gradients is unimodal rather than linear, then the canonical CA (CCA) is more appropriate.

Load Data and Subsample

Analysis starts from loading the data in the Load Data panel. User selects the project and the data label to load the saved data. After loading data, subsampling can be done in the Filter panel if needed. Read our tutorial 07. Subsetting Data about the usage of the Filter panel.

Parameters

General tab provides options for the preparation of the dataset for the analysis:

  • Taxonomic Rank For Agglomeration: Users may specify the taxonomic rank for the analysis here. The naming of the taxa point will follow the taxonomical rank selected.

  • Abundance Type: choose between Raw Count (original read counts), Rarefied Count (rarefying sample read counts to the lowest count amongst samples) and Relative Abundance (counts are divided by the sum of counts in each sample).

Click the Prepare Data button, more options will appear after data preparation is done.

Table X tab provides options for customising the taxa (species) data table:

  • Filter Rank: User may select any taxonomical rank here (same / higher level than the rank selected for analysis) for the filtering of taxa. The Select labels box will be instantly updated to show all the taxa with checkboxes, where user may select the taxa of interest to be included in the analysis.

Table Y tab provides options for customising the constraining matrix consist of the independent environmental variables. User may select the features of interest (variables from the sample data table) to be included in the analysis. Note that it is not recommended to perform constrained ordination with all the environmental variables you happen to have: adding the number of constraints means slacker constraint, and you finally end up with a solution similar to unconstrained ordination. In that case it is better to use unconstrained ordination with environmental fitting. Those environmental variables not selected here could be fitted in the ordination post hoc using the "Envfit" function.

RDA tab provides a Formula option for customising the constrained model design. If a formula is not provided, analysis will include all features selected in the Table Y tab. In R the formula has a special character, a tilde symbol . On the left-hand side of the tilde (~) is the "dependent variable" (the species data here, where GenePiper will handle), while on the right-hand side is the "independent variables" (the environmental variables for the constrained model) that are joined by plus signs +. The + symbol denotes inclusion of additional explanatory variables.

e.g. ord <- cca(dune ~ A1 + Management, data=dune.env)

In addition to the + operator, there are other useful operators:

: for interaction;

* for crossing;

%in% for nesting; And

^ for limit crossing to the specified degree.

More about the formula setting:

Results

Table X tab shows the species abundance table (community data matrix).

Table Y tab shows the constraining environmental variables.

RDA tab contains all the analysis results.

  • Output tab shows the summary of RDA results, and the resulting object can be downloaded by clicking the Download RDA button.

  • Taxa tab shows the table of the ordination scores of the taxa, and the table can be downloaded as tab-delimited table by clicking the Download button.

  • Sample tab shows the table of the ordination scores of samples, and the table can be downloadedas tab-delimited table by clicking the Download button.

  • Constraint tab shows the table of the constraints and, the table can be downloaded as tab-delimited table by clicking the Download button.

  • Biplot tab tab shows the table of biplot scores, and the table can be downloadedas tab-delimited table by clicking the Download button.

  • Permanova tab provides additional statistical analysis, Permutational Multivariate Analysis of Variance (permanova), to test for the significance of the different grouping. Significance levels (P values) are obtained through permutation.
    • Group Column: select a character column from sample data table for the grouping of samples.
    • Distance Method: select a distance/dissimilarity measure for the Permanova analysis.

  • Plot tab shows the plot of RDA, and the plot can be downloaded by clicking the Download button.

Graphic Parameters

Sample tab provides the options for the plotting and labelling of the sample points (as dots). User may adjust for the symbol and label size, which applies instantly in the ordination plot.

Taxa tab provides options for the plotting and labelling of the taxa points (+). User may adjust for the symbol and label size. Label taxa option will be disabled if the number of taxa is over 1000 to avoid crashing.

Biplot tab provides the options of Line Width and Label Size for customisation, which applies instantly in the ordination plot.

Plot Axis tab provides options to select the plot axis. User have to specify exactly two axes.

Group tab provides options to display additional environmental features for the sample points. User may select a variable from the sample data table in the Group Column pull-down menu, which instantly assign colour to the sample points and labels in the ordination. There are more options for adding information about the classification or grouping of sample points with Convex Hull, Spider and Ellipse functions that overlay in the ordination. The Convex Hull of a set of points is defined as the smallest convex polygon, that encloses all of the points in the set. Convex means that the polygon has no corner that bends inwards. Spider plot connects all points to their centroid as in a spider web. The variable for the grouping is shown in the centroid of each Spider web. User may customise the spider line width and label size. Ellipse adds ellipses of standard deviation or standard error areas at a user-specified significant level. User may also customise the ellipse line width.

Envfit tab provides another option to overlay environmental information onto the ordination, known as the indirect gradient analysis, see our tutorial that provides an overview of the multivariate and ordination analyses. Any environmental features (variables/columns) in the sample data table could be fit into the coordinate of the selected axes via envfit function from the vegan package. The summary of the envfit results will be shown. This text summary results can be downloaded by clicking Download Envfit. User may select the Plot Envfit? option to add the fitted features in the plot. Users should select the features of interest in the Factor tab. For numeric features the fitted vectors are shown as arrows. The arrow points to the direction of most rapid change in the environmental variable. Often this is called the direction of the gradient. The length of the arrow is proportional to the correlation between ordination and environmental variable. Often this is called the strength of the gradient. For categorical features (characters variable), the centroid location of the fitted vectors will be plotted as dots. User may customise the dot and label size, and the line width of the arrows. Note that only the environmental variables NOT selected as constraints should be fitted and tested. See common confusions and mistakes when applying numerical methods in community ecology by David Zeleny

References

  • Paliy O, Shankar V (2016) Application of multivariate statistical techniques in microbial ecology. Molecular ecology. 25(5):1032-57.
  • Ramette A (2007) Multivariate analyses in microbial ecology. FEMS microbiology ecology. 62(2):142-60.
  • Rao CR (1964) The use and interpretation of principal component analysis in applied research. Sankhya A 26: 329–358.
  • Oksanen J (2007) Multivariate analyses of ecological communities in R: vegan tutorial. 39pp, http://cc.oulu.fi/~jarioksa/opetus/metodi/vegantutor.pdf
  • Van den Wollenberg AL (1977) Redundancy analysis an alternative for canonical correlation analysis. Psychometrika, 42, 207–219.