lefse.R - juanravm/MicroSeqProfiler GitHub Wiki
Linear Discriminant Analysis Effect Size (lefse.R)
In this case, lefse.R script performs a pairwise LEfSe analysis to determine taxons whose differential abundance between two groups contribute to differentiate these classes. This analysis performs initially a Kruskal-Wallis and a Wilcoxon Rank-Sum test to determine differentially abundant taxa in the analysis. Then, it uses the Linear Discriminant Analysis to determine the effect caused by each taxa in the differentiation between both groups. This analysis, not only reveals differentially abundant microorganisms at different taxonomic levels, but also weights the effect of each taxon in the differentiation between studied groups. This analysis is carried out thanks to the lefser R package.
As well as alpha and beta plot scripts, it is necesary to deactivate conda enviroment prior to execute lefse.R:
conda deactivate
You can run this script as shown below:
Rscript file_path/lefse.R \
--species_fp file_path/species.tsv \
--metadata_fp file_path/metadata.tsv \
--output_dir directory_path/LEfSe \
--sampling 53000 \
--col Comparison \
--ref_group Healthy \
--group2 Case \
--minimum_LDA 2
Specifying the variables:
--species_fp- species.tsv file path generated by OTU_decontam.sh script--metadata_fp- metadata.tsv file path--output_dir- Output directory path. We recommend using the LEfSe directory created by OTU_decontam.sh script--sampling- Minimum sampling depth to calculate diversity analysis. Samples with a sampling depth lower than the indicated will be removed from the analysis. Sampling depth usually change between sequencing sets, so you should establish your minimum sampling depth by checking the "Interactive Sample Detail" from QC-table.qzv file in QIIME2 View (generated previously with importQC.sh script).--col- Metadata column where the two compared groups are--ref_group- Reference group name for LEfSe analysis--group2- Name of the group compared against the reference group--minimum_LDA- Minimum LDA effect score to appear in the output table. Default: 2
This script returns:
LEfSe_output.tsv- A file with the LDA scores of significant microorganisms with higher effect than the specified with--minimum_LDA. Negative LDA scores shows microorganisms whose abundance is significantly increased in the reference group compared to the second group. Positive LDA scores shows microorganisms whose abundance is significantly decreased in the reference group compared to the second group