Community Environment relationship - umerijaz/microbiomeSeq GitHub Wiki
This function shows the relationship between most abundant taxa and numerical environmental variables based on correlation.The abundance of each feature/taxa is correlated with each of the environmental variables. A correlation test is performed and associated p-values are adjusted for multiple testing. The scheme of adjustment is elaborated in the arguments section. The function returns a data.frame with raw p-values, corrected p-values, and correlation results.
physeq
is a phyloseq object containing taxa abundance and meta data information. grouping_column
is a character string for variable with respect to which the data should be grouped. method
a character string indicating which correlation coefficient is to be computed, available options are "pearson"
which is also the default, "kendall"
and "spearman"
. adjustment
is an integer with options 1,2,3,4,5
which indicate a way for adjusting p-values for multiple comparisions using Benjamin and Hochberg. These options have the following implications.
- 1 - donot adjust
- 2 - adjust environmental variables + Groups (column on the correlation plot)
- 3 - adjust Taxa + Groups (row on the correlation plot for each Groups)
- 4 - adjust Taxa (row on the correlation plot)
- 5 - adjust environmental variables (panel on the correlation plot)
num.taxa
is an integer indicating the number of taxa to be used in the correlation plot, default is 50. select.variables
is a list of environmental variables to be used in the correlation computation. If not specified, all numerical variables are used as shown in the example below.
env.taxa.cor <- env_taxa_correlation(physeq, grouping_column="Country", method="pearson", pvalue.threshold=0.05,
padjust.method="BH", adjustment=5, num.taxa=50, select.variables=NULL)
Then visualise the correlation results using plot_taxa_env
function.
p <- plot_taxa_env(env.taxa.cor)
print(p)
Fuzzy set ordination is used to test effects of pertubation in environmental avariables to community structure. For each of the specified variables, a fuzzy set ordination is calculated and the correlation between the original variable and the fuzzy set is reported. The significance of a particular variable is assessed by comparing a specified threshold p-value and the probability of obtaining a correlation between the data and fuzzy set.
The results are visualised by producing a plot of fuzzy set against original values which is annotated with a correlation between them and a significance label.
physeq
is a phyloseq object containing taxa abundance and meta data information. grouping_column
is a character string for variable with respect to which the data should be grouped. method
is an integer specifying method for computing similarity indices options include:
- 1 = Baroni-Urbani & Buser
- 2 = Horn
- 3 = Yule
indices
an integer for column number corresponding to environmental variable of interest. The default is set for all variables. filename
creates a file of fuzzy set correlation with a provided filename
. In this example, we have selected a variable Temp for illustration.
p<-generateFSO(physeq,grouping_column="Country",method=2, indices=2,filename=NULL)
print(p)
This function finds a set of best environmental variables that describe community structure.
physeq
is a required phyloseq object containing taxa abundance and meta data. grouping_column
is the variable in the meta data with respect to which the data should be grouped, pvalueCutoff
the threshold p-value in anova
of distance matrices, default set to 0.05
. env.variables
is a list of variables prefered to be on the cca plot. exclude.variables
a list of variables to be excluded from the cca plot. num.env.variables
is an integer specifying the number of variables to show on the cca plot. This could be helpful to avoid over crowding of the plot.
plot_cca(physeq=physeq,grouping_column="Country",pvalueCutoff=0.01,env.variables=NULL, num.env.variables=NULL, exclude.variables="Country",draw_species=F)
This function performs analysis of variance on selected environmental variables plots the distribution of variables annotated with significance of variation in specified groups. physeq
is a required phyloseq object containing taxa abundance and meta data. grouping_column
is character string specifying the variable in the meta data with respect to which the data should be grouped, pvalueCutoff
the threshold p-value in anova
of environment variables, default set to 0.05
. selec.variables
is a list of character strings for the variables to be analysed. In the first example, two variables "Temp" and "pH" are selected and grouped with respect to "Country".
p<-plot_anova_env(physeq,grouping_column="Country",pValueCutoff=0.05,select.variables=c("Temp","pH"))
print(p)
Selecting "Temp" and "pH" and grouping by "Depth".
p2<-plot_anova_env(physeq,grouping_column = "Depth",select.variables=c("Temp","pH"))
print(p2)
Selecting only "pH" and grouping by "Latrine".
p<-plot_anova_env(physeq,grouping_column = "Latrine",select.variables="pH")
print(p)