Converting DADA2 output from R into QIIME 2 to run ANCOM - meyermicrobiolab/Meyer_Lab_Resources GitHub Wiki
Intro
The purpose of this tutorial is to show how to use QIIME 2 to perform ANCOM analysis on files (OTU table, Taxon table, and a metadata file) generated through the DADA2 pipeline in R. While you can install QIIME locally, I suggest doing this on the Hiper-Gator because the ANCOM module takes a long time to run.
Steps
- Format the Data
- Convert OTU to BIOM file
- Load into QIIME2
- Transpose Table
- Run ANCOM
- Visualizing Results
- Join with Taxon Table
Format the data
The first thing we want to do is add headers and remove the quotes from our files. You can add the headers using any text editor but I prefer to use vi _textfile.txt_
or vim _textfile.txt_
from the command line for efficiency. Starting with the OTU table, we need to add the header in the first column. Opening the file press the i key on your keyboard to open up the insert mode and type sample-id in-between the first set of empty quotes. When you're done making your changes press the esc key to exit the insert mode and then press :wq to save and exit. The w writes to the file and the q is for exiting the file. Repeat this step with the metadata.txt file, making sure that the column headers for the OTU, and metadata files MATCH. Now we need to remove the quotes from the OTU, and Taxon Tables with sed
.
sed -i 's/\"//g' __yourfile.txt__
Repeat for the Taxon Table.
Create a BIOM File from the OTU Table
Since we are doing this in the Hiper-Gator we can simply load the QIIME module
module load qiime2
biom convert -i yourtable.txt -o table.from_txt_hdf5.biom --table-type="OTU table" --to-hdf5
Load into QIIME
qiime tools import \
--input-path yourfile.biom \
--type 'FeatureTable[Frequency]' \
--input-format BIOMV210Format \
--output-path your-feature-table.qza
Transpose the Table
If you looked at the metadata.txt and yourOTUtable.txt you will see that the columns of the OTU table correspond to the rows of the metadata data file. To correct this we need to transpose our feature-table.
qiime feature-table transpose --i-table your-feature-table.qza \
--o-transposed-feature-table your-feature-table-transposed.qza
ANCOM requires a QIIME artifact of type Composition. We can add a pseudo count thereby transforming our object with
qiime composition add-pseudocount \
--i-table your-feature-table-transposed.qza \
--o-composition-table comp-feature-table.qza
Run ANCOM
Running this QIIME plugin can be very time costly.This is why I recommend using the Hiper Gator and submitting it as a SLURM script. ANCOM also requires your metadata file, and your area of interest. It returns a .qzv (QIIME Visualization type file).
qiime composition ancom \
--i-table comp-feature-table.qza \
--m-metadata-file metadata.txt \
--m-metadata-column your-column \
--o-visualization your-ancom-viz.qzv
Visualizing your Results
This link will take you to a online viewer for QIIME artifacts and visualizations. If your ANCOM results were meaningful your page should looks something like this.
We can save our image by saving it as a .svg file. If we scroll down we can see the ANCOM Results and the Percentile Abundances of Features by Group. We can download both of these text files from the page for later use.
Combining Results and Taxonomy
So the ANCOM results and the Percentiles of Abundances don't include any taxonomic information about the OTUs present in the files. We can make these result files more meaningful by joining them with our taxon_table.txt
.
Lets start with the Percentile Abundances. We will be using the Join Command which requires our files to be sorted before we begin, and implies that we will be saving our results into a new file. We can start by making that new file and adding the column names that we need. We can do that by
head -n2 percent-abundances.tsv > percent-abundances-w-tax.tsv
head -n1 taxa-table.txt >> percencent-abundances-w-tax.tsv
Now go into a text editor remove the new line so that the taxonomy names are now on the same line as the group headers and there are tabs in between all the columns names. Now repeat this process for the ANCOM statistic.
head -n1 ancom.tsv > ancom-w-tax.tsv
head -n1 taxa-table.txt >> ancom-w-tax.tsv
Then go in and remove the extra line so that the taxonomic group is on the same line as the ANCOM columns. Basiaclly we are just trying to make our columns nice.
Now that we have our column names taken care of we need to delete them from the original files. This is because in order to join
the files through the command line we need to sort
the files first and when those column names get rearranged they could throw some odd errors on the join. You can do this easily by opening the file in vim or vi and without going into insert mode press dd
on the line you wish to delete. Now we are ready to join our files together.
Start with the abundance first
join percent-abundance.tsv taxon-table.txt -t $'\t' >> percent-abundances-with-tax.tsv
To add the taxonomic information to the ANCOM statistics we need to sort the data first.
sort ancom.sh > ancom-sorted.tsv
sort taxa-table.txt > taxa-table-sorted.txt
join ancom-sorted.tsv taxa-table-sorted.txt -t $'\t' >> ancom-w-tax.tsv