qiime2 to lefse - statonlab/BiGG2020_CrackNAg GitHub Wiki
LEfSe (Linear discriminant analysis Effect Size) determines the features (organisms, clades, operational taxonomic units, genes, or functions) most likely to explain differences between classes by coupling standard tests for statistical significance with additional tests encoding biological consistency and effect relevance.
The lefse package is installed in the conda environment on the server. The tutorial to run the program is https://bitbucket.org/biobakery/biobakery/wiki/lefse#rst-header-id5.
First of all, we need to generate the input file for LEfSe, using this tutorial
prepare qiime2 files for lefse
Collapse files to level 4. (I think you can choose the proper taxa level you like)
conda activate qiime2-2020.2
qiime taxa collapse \
--i-table table-no-mitochondria-no-chloroplast.qza \
--o-collapsed-table collapse.table.qza \
--p-level 4 \
--i-taxonomy taxonomy.qza
Calculate the relative abundance of taxa
qiime feature-table relative-frequency \
--i-table collapse.table.qza \
--o-relative-frequency-table collapse.frequency.table.qza
Export the collapse.frequency.table.qza into biom and biom convert to tsv format
# Export to biom
qiime tools export \
--input-path collapse.frequency.table.qza \
--output-path collapse.frequency/
# convert biom to tsv
biom convert \
-i collapse.frequency/feature-table.biom \
-o collapse.frequency/collapse.frequency.table.txt \
--to-tsv
Modify the collapse.frequency.table.txt
to match the format like this:
bodysite mucosal mucosal mucosal mucosal mucosal non_mucosal non_mucosal non_mucosal non_mucosal non_mucosal
subsite oral gut oral oral gut skin nasal skin ear nasal
id 1023 1023 1672 1876 1672 159005010 1023 1023 1023 1672
Bacteria 0.99999 0.99999 0.999993 0.999989 0.999997 0.999927 0.999977 0.999987 0.999997 0.999993
Bacteria|Actinobacteria 0.311037 0.000864363 0.00446132 0.0312045 0.000773642 0.359354 0.761108 0.603002 0.95913 0.753688
Bacteria|Bacteroidetes 0.0689602 0.804293 0.00983343 0.0303561 0.859838 0.0195298 0.0212741 0.145729 0.0115617 0.0114511
- Delete the
#
Constructed from biom file in the table.txt. - Delete the
#
before OTUID - The first two tows are the metadata of the samples, the third row is the sample ID.
I would use the transpose function in MS excel to convert the metadata, and edit the sampleID into the correct order to fit my sample ID order. Then copy the two metadata rows into a
meta.txt
and merge the meta file with the sample frequency file
cat meta.txt collapse.frequency.table.txt > collapse.frequency.table.with.meta.txt
Now, if you look at your collapse.frequency.table.with.meta.txt
, it should be the same format as the above table.
Removing the Blanks
Remove blanks from the table-no-mitochondria-no-chloroplast.qza. Repeating earlier steps with the new tables.
conda activate qiime2-2020.2
qiime feature-table filter-samples \
--i-table table-no-mitochondria-no-chloroplast.qza \
--m-metadata-file sample-metadata.tsv \
--p-where "[Compartment] IN ('Leaf', 'Root')" \
--o-filtered-table filtered-table-no-blank-no-control.qza
qiime taxa collapse \
--i-table filtered-table-no-blank-no-control.qza \
--o-collapsed-table collapse.table.noBlank.noControl.qza \
--p-level 4 \
--i-taxonomy taxonomy.qza
qiime feature-table relative-frequency \
--i-table collapse.table.noBlank.noControl.qza \
--o-relative-frequency-table collapse.frequency.table.noBlank.noControl.qza
# Export to biom
qiime tools export \
--input-path collapse.frequency.table.noBlank.noControl.qza \
--output-path collapse.frequency.noBlank.noControl/
# convert biom to tsv
biom convert \
-i collapse.frequency.noBlank.noControl/feature-table.biom \
-o collapse.frequency.noBlank.noControl/collapse.frequency.table.txt \
--to-tsv
Run lefse
To load the LEfSe package, I've installed it in a lefse conda environment, so run
conda activate lefse
Now, you see (lefse)
before your username, it means you are in the lefse environment and ready to use lefse program.
Use the last step table to run the formatting command, which will generate the input data for next step. You can put the analysis into a new directory (it is totally up to you).
mkdir lefse # a new directory to store the lefse analysis
format_input.py collapse.frequency.table.with.meta.txt lefse/collapse.frequency.table.with.meta.in -c 1 -s 2 -u 3 -o 1000000
-c [1..n_feats]
- set which feature use as class (default 1).
-s [1..n_feats]
- set which feature use as subclass (default -1 meaning no subclass)
-o float
- set the normalization value (default -1.0 meaning no normalization)
-u [1..n_feats]
- set which feature use as subject (default -1 meaning no subject)
Run the lefse analysis
run_lefse.py lefse/collapse.frequency.table.with.meta.in lefse/collapse.frequency.table.with.meta.res -b 200 -l 4
-b 200
: set the number of bootstrap iteration for LDA. This is from Fang's note.
It generates a result file called collapse.frequency.table.with.meta.res
in the lefse directory.
visualize lefse result
Bar plot
plot_res.py lefse/collapse.frequency.table.with.meta.res lefse/collapse.frequency.table.with.meta.bar.png
Cladogram
plot_cladogram.py lefse/collapse.frequency.table.with.meta.res lefse/collapse.frequency.table.with.meta.cladogram.png --format png
Then you can transfer the .png files to your own computer to look at them.