qiime2 to lefse - statonlab/BiGG2020_CrackNAg GitHub Wiki

LEfSe (Linear discriminant analysis Effect Size) determines the features (organisms, clades, operational taxonomic units, genes, or functions) most likely to explain differences between classes by coupling standard tests for statistical significance with additional tests encoding biological consistency and effect relevance.

The lefse package is installed in the conda environment on the server. The tutorial to run the program is https://bitbucket.org/biobakery/biobakery/wiki/lefse#rst-header-id5.

First of all, we need to generate the input file for LEfSe, using this tutorial

prepare qiime2 files for lefse

Collapse files to level 4. (I think you can choose the proper taxa level you like)

conda activate qiime2-2020.2

qiime taxa collapse \
--i-table table-no-mitochondria-no-chloroplast.qza \
--o-collapsed-table collapse.table.qza \
--p-level 4 \
--i-taxonomy taxonomy.qza

Calculate the relative abundance of taxa

qiime feature-table relative-frequency \
--i-table collapse.table.qza \
--o-relative-frequency-table collapse.frequency.table.qza 

Export the collapse.frequency.table.qza into biom and biom convert to tsv format

# Export to biom
qiime tools export \
--input-path collapse.frequency.table.qza \
--output-path collapse.frequency/
# convert biom to tsv
biom convert \
-i collapse.frequency/feature-table.biom \
-o collapse.frequency/collapse.frequency.table.txt \
--to-tsv

Modify the collapse.frequency.table.txt to match the format like this:

bodysite                                mucosal         mucosal         mucosal         mucosal         mucosal         non_mucosal     non_mucosal     non_mucosal     non_mucosal     non_mucosal
subsite                                 oral            gut             oral            oral            gut             skin            nasal           skin            ear             nasal
id                                      1023            1023            1672            1876            1672            159005010       1023            1023            1023            1672
Bacteria                                0.99999         0.99999         0.999993        0.999989        0.999997        0.999927        0.999977        0.999987        0.999997        0.999993
Bacteria|Actinobacteria                 0.311037        0.000864363     0.00446132      0.0312045       0.000773642     0.359354        0.761108        0.603002        0.95913         0.753688
Bacteria|Bacteroidetes                  0.0689602       0.804293        0.00983343      0.0303561       0.859838        0.0195298       0.0212741       0.145729        0.0115617       0.0114511
  1. Delete the # Constructed from biom file in the table.txt.
  2. Delete the # before OTUID
  3. The first two tows are the metadata of the samples, the third row is the sample ID. I would use the transpose function in MS excel to convert the metadata, and edit the sampleID into the correct order to fit my sample ID order. Then copy the two metadata rows into a meta.txt and merge the meta file with the sample frequency file
cat meta.txt collapse.frequency.table.txt > collapse.frequency.table.with.meta.txt

Now, if you look at your collapse.frequency.table.with.meta.txt, it should be the same format as the above table.

Removing the Blanks

Remove blanks from the table-no-mitochondria-no-chloroplast.qza. Repeating earlier steps with the new tables.

conda activate qiime2-2020.2

qiime feature-table filter-samples \
  --i-table table-no-mitochondria-no-chloroplast.qza \
  --m-metadata-file sample-metadata.tsv \
  --p-where "[Compartment] IN ('Leaf', 'Root')" \
  --o-filtered-table filtered-table-no-blank-no-control.qza

qiime taxa collapse \
  --i-table filtered-table-no-blank-no-control.qza \
  --o-collapsed-table collapse.table.noBlank.noControl.qza \
  --p-level 4 \
  --i-taxonomy taxonomy.qza

qiime feature-table relative-frequency \
  --i-table collapse.table.noBlank.noControl.qza \
  --o-relative-frequency-table collapse.frequency.table.noBlank.noControl.qza

# Export to biom
qiime tools export \
--input-path collapse.frequency.table.noBlank.noControl.qza \
--output-path collapse.frequency.noBlank.noControl/
# convert biom to tsv
biom convert \
-i collapse.frequency.noBlank.noControl/feature-table.biom \
-o collapse.frequency.noBlank.noControl/collapse.frequency.table.txt \
--to-tsv

Run lefse

To load the LEfSe package, I've installed it in a lefse conda environment, so run

conda activate lefse

Now, you see (lefse) before your username, it means you are in the lefse environment and ready to use lefse program.

Use the last step table to run the formatting command, which will generate the input data for next step. You can put the analysis into a new directory (it is totally up to you).

mkdir lefse # a new directory to store the lefse analysis
format_input.py collapse.frequency.table.with.meta.txt lefse/collapse.frequency.table.with.meta.in -c 1 -s 2 -u 3 -o 1000000

-c [1..n_feats] - set which feature use as class (default 1).
-s [1..n_feats] - set which feature use as subclass (default -1 meaning no subclass) -o float - set the normalization value (default -1.0 meaning no normalization) -u [1..n_feats] - set which feature use as subject (default -1 meaning no subject) Run the lefse analysis

run_lefse.py lefse/collapse.frequency.table.with.meta.in lefse/collapse.frequency.table.with.meta.res -b 200 -l 4

-b 200: set the number of bootstrap iteration for LDA. This is from Fang's note.
It generates a result file called collapse.frequency.table.with.meta.res in the lefse directory.

visualize lefse result

Bar plot

plot_res.py lefse/collapse.frequency.table.with.meta.res lefse/collapse.frequency.table.with.meta.bar.png

Cladogram

plot_cladogram.py lefse/collapse.frequency.table.with.meta.res lefse/collapse.frequency.table.with.meta.cladogram.png --format png

Then you can transfer the .png files to your own computer to look at them.