phylo RPCA in Qiime2 - Michael-D-Preston/PrestonLab GitHub Wiki

By Angus Ball

Introduction

Welcome to the phylo-rpca tutorial, I've already explained the theory of this in the Beta Diversity page so go there for a refresher. This tutorial also assumes you have Qiime2 set up. Anyways without further ado...

quick reference to how I'll format this document

  • This is the step
This is the code you'll run, note the copy button --->

This is the output

  • Step 2
    • These are bonus facts that I want to say,
    • or sub steps

installation

  • start in Qiime2 and run
conda activate qiime2-amplicon-2024.2
  • and run
pip install gemelli

pip install empress
  • NOTE: Pip is a python command, you should have pip access already but if not go download python
  • Then lets load up our data

You need the following information from this tutorial: Exporting your data to QIIME2

  • Sample data (or the Key data from the phyloseq object) called sample-metadata.txt
  • Taxonomy table called tax.txt
  • OTU table called OTUbiomv210.biom
  • Sequence table called rep-seqs.fna

Once you have these loaded into qiime2 (see above tutorial) you can start running the gemelli process!

This first command generates the phylogenetic tree based on the sequence information we gave

qiime phylogeny align-to-tree-mafft-fasttree \
 --i-sequences rep-seqs.qza \
 --o-alignment aligned-rep-seqs.qza \
 --o-masked-alignment masked-aligned-rep-seqs.qza \
 --o-tree unrooted-tree.qza \
 --o-rooted-tree rooted-tree.qza \
 --p-n-threads 10

Saved FeatureData[AlignedSequence] to: aligned-rep-seqs.qza

Saved FeatureData[AlignedSequence] to: masked-aligned-rep-seqs.qza

Saved Phylogeny[Unrooted] to: unrooted-tree.qza

Saved Phylogeny[Rooted] to: rooted-tree.qza

Then we run gemelli and the phylo-rpca itself.

qiime gemelli phylogenetic-rpca-with-taxonomy \
    --i-table table.qza \
    --i-phylogeny rooted-tree.qza \
    --m-taxonomy-file taxonomy.qza \
    --p-min-feature-count 10 \
    --p-min-sample-count 500 \
    --o-biplot phylo-ordination.qza \
    --o-distance-matrix phylo-distance.qza \
    --o-counts-by-node-tree phylo-tree.qza \
    --o-counts-by-node phylo-table.qza \
    --o-t2t-taxonomy phylo-taxonomy.qza

But the phylo-rpca isnt viewable really, we need to use empress to make it pretty! So run the next empress command You'll also need to bring the phylo-ordination.qza file into your home computer

qiime empress community-plot\
    --i-tree phylo-tree.qza\
    --i-feature-table table.qza\
    --i-pcoa phylo-ordination.qza\
    --m-sample-metadata-file SAM.txt\
    --m-feature-metadata-file phylo-taxonomy.qza\
    --p-filter-missing-features\
    --p-number-of-features 50\
    --o-visualization phylo-rpca-empress.qzv

The important output here is the phylo-rpca-empress.qzv. (Take this and bring it onto your home computer)

Hey if you are getting weird sample not existing errors try --m-feature-metadata-file phylo-taxonomy.qza or --p-ignore-missing-samples

These next commands determine the beta group significance across different sample types i.e. it runs a permanova to statistically decide what metadata affects sample composition. Run it on whatever metadata columns you want by changing the command --m-metadata-column #name.

qiime diversity beta-group-significance \
    --i-distance-matrix phylo-distance.qza \
    --m-metadata-file SAM.txt \
    --m-metadata-column Amd \
    --p-method permanova \
    --o-visualization phyloRPCA_Amd_significance.qzv
qiime diversity beta-group-significance \
    --i-distance-matrix phylo-distance.qza \
    --m-metadata-file SAM.txt \
    --m-metadata-column Inoculum\
    --p-method permanova \
    --o-visualization phyloRPCA_Inoculum_significance.qzv

You'll want to bring the phyloRPCA_Inoculum_significance.qzv and phyloRPCA_Amd_significance.qzv objects onto your home computer.

Once you have all the objects go to (https://view.qiime2.org/)[https://view.qiime2.org/] and drag your objects into the viewer. You can now explore, particularly the phylo-rpca-empress.qzv to see what has happened!.

bits and bobbins I needed to write down somewhere for me, you can ignore this

/home/aball/anaconda/envs/qiime2-amplicon-2023.9/lib/python3.8/site-packages/gemelli

preprocessing.py

np.seterr(divide = 'ignore', invalid='ignore')

pip install git+https://github.com/biocore/gemelli.git@jointrpca

/home/aball/anaconda/envs/qiime2-amplicon-2023.9/lib/python3.8/site-packages

You can also look at the log fold change of taxa across sample types with qurro but NOTE THIS CURRENTLY DOES NOT WORK. scripty stats wants me to die so don't install qurro and run this command unless you wanna reinstall qiime2

qiime qurro loading-plot \
    --i-ranks phylo-ordination.qza \
    --i-table phylo-table.qza \
    --m-sample-metadata-file sample-metadata.txt \
    --m-feature-metadata-file taxonomy.qza \
    --o-visualization phyloRPCA_qurro_plot.qzv
qiime diversity beta-group-significance \
    --i-distance-matrix phylo-distance.qza \
    --m-metadata-file SAM.txt \
    --m-metadata-column Species\
    --p-method permanova \
    --o-visualization phyloRPCA_Species_significance.qzv
qiime diversity beta-group-significance \
    --i-distance-matrix phylo-distance.qza \
    --m-metadata-file SAM.txt \
    --m-metadata-column Compaction\
    --p-method permanova \
    --o-visualization phyloRPCA_Compaction_significance.qzv
qiime diversity beta-group-significance \
    --i-distance-matrix phylo-distance.qza \
    --m-metadata-file SAM.txt \
    --m-metadata-column Matter\
    --p-method permanova \
    --o-visualization phyloRPCA_Matter_significance.qzv
get phyloRPCA_Matter_significance.qzv
get phyloRPCA_Compaction_significance.qzv
get phyloRPCA_Species_significance.qzv
get phylo-rpca-empress.qzv
get phylo-ordination.qza