Community Environment relationship - umerijaz/microbiomeSeq GitHub Wiki

Correlation between numerical environmental variables and most abundant taxa

This function shows the relationship between most abundant taxa and numerical environmental variables based on correlation.The abundance of each feature/taxa is correlated with each of the environmental variables. A correlation test is performed and associated p-values are adjusted for multiple testing. The scheme of adjustment is elaborated in the arguments section. The function returns a data.frame with raw p-values, corrected p-values, and correlation results.

physeq is a phyloseq object containing taxa abundance and meta data information. grouping_column is a character string for variable with respect to which the data should be grouped. method a character string indicating which correlation coefficient is to be computed, available options are "pearson" which is also the default, "kendall" and "spearman". adjustment is an integer with options 1,2,3,4,5 which indicate a way for adjusting p-values for multiple comparisions using Benjamin and Hochberg. These options have the following implications.

1 - donot adjust
2 - adjust environmental variables + Groups (column on the correlation plot)
3 - adjust Taxa + Groups (row on the correlation plot for each Groups)
4 - adjust Taxa (row on the correlation plot)
5 - adjust environmental variables (panel on the correlation plot)

num.taxa is an integer indicating the number of taxa to be used in the correlation plot, default is 50. select.variables is a list of environmental variables to be used in the correlation computation. If not specified, all numerical variables are used as shown in the example below.

env.taxa.cor <- env_taxa_correlation(physeq, grouping_column="Country", method="pearson", pvalue.threshold=0.05,
                                 padjust.method="BH", adjustment=5, num.taxa=50, select.variables=NULL)

Then visualise the correlation results using plot_taxa_env function.

p <- plot_taxa_env(env.taxa.cor)
print(p)

Fuzzy set ordination of environmental variables

Fuzzy set ordination is used to test effects of pertubation in environmental avariables to community structure. For each of the specified variables, a fuzzy set ordination is calculated and the correlation between the original variable and the fuzzy set is reported. The significance of a particular variable is assessed by comparing a specified threshold p-value and the probability of obtaining a correlation between the data and fuzzy set.

The results are visualised by producing a plot of fuzzy set against original values which is annotated with a correlation between them and a significance label.

1 = Baroni-Urbani & Buser
2 = Horn
3 = Yule

indices an integer for column number corresponding to environmental variable of interest. The default is set for all variables. filename creates a file of fuzzy set correlation with a provided filename. In this example, we have selected a variable Temp for illustration.

p<-generateFSO(physeq,grouping_column="Country",method=2, indices=2,filename=NULL)
print(p)

Canonical Correspondence Analysis

This function finds a set of best environmental variables that describe community structure.

physeq is a required phyloseq object containing taxa abundance and meta data. grouping_column is the variable in the meta data with respect to which the data should be grouped, pvalueCutoff the threshold p-value in anova of distance matrices, default set to 0.05. env.variables is a list of variables prefered to be on the cca plot. exclude.variables a list of variables to be excluded from the cca plot. num.env.variables is an integer specifying the number of variables to show on the cca plot. This could be helpful to avoid over crowding of the plot.

plot_cca(physeq=physeq,grouping_column="Country",pvalueCutoff=0.01,env.variables=NULL, num.env.variables=NULL, exclude.variables="Country",draw_species=F)

ANOVA of environmental variables

This function performs analysis of variance on selected environmental variables plots the distribution of variables annotated with significance of variation in specified groups. physeq is a required phyloseq object containing taxa abundance and meta data. grouping_column is character string specifying the variable in the meta data with respect to which the data should be grouped, pvalueCutoff the threshold p-value in anova of environment variables, default set to 0.05. selec.variables is a list of character strings for the variables to be analysed. In the first example, two variables "Temp" and "pH" are selected and grouped with respect to "Country".

p<-plot_anova_env(physeq,grouping_column="Country",pValueCutoff=0.05,select.variables=c("Temp","pH"))
print(p)

Selecting "Temp" and "pH" and grouping by "Depth".

p2<-plot_anova_env(physeq,grouping_column =  "Depth",select.variables=c("Temp","pH"))
print(p2)

Selecting only "pH" and grouping by "Latrine".

p<-plot_anova_env(physeq,grouping_column =  "Latrine",select.variables="pH")
print(p)