phylo RPCA in Qiime2 - Michael-D-Preston/PrestonLab GitHub Wiki
By Angus Ball
Introduction
Welcome to the phylo-rpca tutorial, I've already explained the theory of this in the Beta Diversity page so go there for a refresher. This tutorial also assumes you have Qiime2 set up. Anyways without further ado...
quick reference to how I'll format this document
- This is the step
This is the code you'll run, note the copy button --->
This is the output
- Step 2
- These are bonus facts that I want to say,
- or sub steps
installation
- start in Qiime2 and run
conda activate qiime2-amplicon-2024.2
- and run
pip install gemelli
pip install empress
- NOTE: Pip is a python command, you should have pip access already but if not go download python
- Then lets load up our data
You need the following information from this tutorial: Exporting your data to QIIME2
- Sample data (or the Key data from the phyloseq object) called sample-metadata.txt
- Taxonomy table called tax.txt
- OTU table called OTUbiomv210.biom
- Sequence table called rep-seqs.fna
Once you have these loaded into qiime2 (see above tutorial) you can start running the gemelli process!
This first command generates the phylogenetic tree based on the sequence information we gave
qiime phylogeny align-to-tree-mafft-fasttree \
--i-sequences rep-seqs.qza \
--o-alignment aligned-rep-seqs.qza \
--o-masked-alignment masked-aligned-rep-seqs.qza \
--o-tree unrooted-tree.qza \
--o-rooted-tree rooted-tree.qza \
--p-n-threads 10
Saved FeatureData[AlignedSequence] to: aligned-rep-seqs.qza
Saved FeatureData[AlignedSequence] to: masked-aligned-rep-seqs.qza
Saved Phylogeny[Unrooted] to: unrooted-tree.qza
Saved Phylogeny[Rooted] to: rooted-tree.qza
Then we run gemelli and the phylo-rpca itself.
qiime gemelli phylogenetic-rpca-with-taxonomy \
--i-table table.qza \
--i-phylogeny rooted-tree.qza \
--m-taxonomy-file taxonomy.qza \
--p-min-feature-count 10 \
--p-min-sample-count 500 \
--o-biplot phylo-ordination.qza \
--o-distance-matrix phylo-distance.qza \
--o-counts-by-node-tree phylo-tree.qza \
--o-counts-by-node phylo-table.qza \
--o-t2t-taxonomy phylo-taxonomy.qza
But the phylo-rpca isnt viewable really, we need to use empress to make it pretty! So run the next empress command You'll also need to bring the phylo-ordination.qza file into your home computer
qiime empress community-plot\
--i-tree phylo-tree.qza\
--i-feature-table table.qza\
--i-pcoa phylo-ordination.qza\
--m-sample-metadata-file SAM.txt\
--m-feature-metadata-file phylo-taxonomy.qza\
--p-filter-missing-features\
--p-number-of-features 50\
--o-visualization phylo-rpca-empress.qzv
The important output here is the phylo-rpca-empress.qzv. (Take this and bring it onto your home computer)
Hey if you are getting weird sample not existing errors try --m-feature-metadata-file phylo-taxonomy.qza or --p-ignore-missing-samples
These next commands determine the beta group significance across different sample types i.e. it runs a permanova to statistically decide what metadata affects sample composition. Run it on whatever metadata columns you want by changing the command --m-metadata-column #name.
qiime diversity beta-group-significance \
--i-distance-matrix phylo-distance.qza \
--m-metadata-file SAM.txt \
--m-metadata-column Amd \
--p-method permanova \
--o-visualization phyloRPCA_Amd_significance.qzv
qiime diversity beta-group-significance \
--i-distance-matrix phylo-distance.qza \
--m-metadata-file SAM.txt \
--m-metadata-column Inoculum\
--p-method permanova \
--o-visualization phyloRPCA_Inoculum_significance.qzv
You'll want to bring the phyloRPCA_Inoculum_significance.qzv and phyloRPCA_Amd_significance.qzv objects onto your home computer.
Once you have all the objects go to (https://view.qiime2.org/)[https://view.qiime2.org/] and drag your objects into the viewer. You can now explore, particularly the phylo-rpca-empress.qzv to see what has happened!.
bits and bobbins I needed to write down somewhere for me, you can ignore this
/home/aball/anaconda/envs/qiime2-amplicon-2023.9/lib/python3.8/site-packages/gemelli
preprocessing.py
np.seterr(divide = 'ignore', invalid='ignore')
pip install git+https://github.com/biocore/gemelli.git@jointrpca
/home/aball/anaconda/envs/qiime2-amplicon-2023.9/lib/python3.8/site-packages
You can also look at the log fold change of taxa across sample types with qurro but NOTE THIS CURRENTLY DOES NOT WORK. scripty stats wants me to die so don't install qurro and run this command unless you wanna reinstall qiime2
qiime qurro loading-plot \
--i-ranks phylo-ordination.qza \
--i-table phylo-table.qza \
--m-sample-metadata-file sample-metadata.txt \
--m-feature-metadata-file taxonomy.qza \
--o-visualization phyloRPCA_qurro_plot.qzv
qiime diversity beta-group-significance \
--i-distance-matrix phylo-distance.qza \
--m-metadata-file SAM.txt \
--m-metadata-column Species\
--p-method permanova \
--o-visualization phyloRPCA_Species_significance.qzv
qiime diversity beta-group-significance \
--i-distance-matrix phylo-distance.qza \
--m-metadata-file SAM.txt \
--m-metadata-column Compaction\
--p-method permanova \
--o-visualization phyloRPCA_Compaction_significance.qzv
qiime diversity beta-group-significance \
--i-distance-matrix phylo-distance.qza \
--m-metadata-file SAM.txt \
--m-metadata-column Matter\
--p-method permanova \
--o-visualization phyloRPCA_Matter_significance.qzv
get phyloRPCA_Matter_significance.qzv
get phyloRPCA_Compaction_significance.qzv
get phyloRPCA_Species_significance.qzv
get phylo-rpca-empress.qzv
get phylo-ordination.qza