Making a Phyloseq Object - Michael-D-Preston/PrestonLab GitHub Wiki

Introduction

Whoot! You've successfully finished the DADA2 pipeline and your data is ready to be analyzed!... or is it? The big names in R metabarcoding data analysis use whats called phyloseq to analyze their data. Phyloseq is a nifty little class that conglomerates all your data into one object, count tables, taxonomy information, and metadata and all! When you're ready to start follow the link...

A note on who needs this tutorial

At the end of the DADA2 tutorial we created a phyloseq object; therefore, this tutorial will not be so helpful? EXCEPT after coming out of the DADA2 pipeline you only have your taxonomy and count information in the phyloseq object. Go to the very end of this tutorial to see how to attach metadata. If you didn't go through DADA2 and someone's handed you a count and taxonomy table you'll want to follow this tutorial in full.

A note on merging phyloseq objects

This tutorial also shows you how to merge phyloseq objects. Say you had two samples that were analyzed in separate runs, normally you'd combine them in DADA2 as count tables using (mergeSequenceTables)[https://rdrr.io/bioc/dada2/man/mergeSequenceTables.html], but if you didn't run DADA2 yourself I show you how to combine two phyloseq objects here (you loose some taxonomy resolution using this method so if you have the option combine in DADA2.

A note on importing data

This tutorial assumes someone has given you a textfile containing the taxonomy and OTU tables associated with your project, but they could've given you a csv file, or a R object. It's all okay as long as your data looks like MetaG (the OTU table), or Tax (the taxonomy table)

Link

How to make a phyloseq object

DEPRICATED (i.e. dont use)

Citations

Citations in R are hard, here are all the packages you use and how to cite them, but you don't necessarily need to cite all of them? Depends on the ethical considerations regarding data management packages (always cite statistical packages). For this section please cite:

Phyloseq:

phyloseq: An R package for reproducible interactive analysis and graphics of microbiome census data. Paul J. McMurdie and Susan Holmes (2013) PLoS ONE 8(4):e61217.

ggplot:

H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.

janitor:

Firke S (2023). janitor: Simple Tools for Examining and Cleaning Dirty Data. R package version 2.2.0, https://CRAN.R-project.org/package=janitor.

dplyr:

Wickham H, François R, Henry L, Müller K, Vaughan D (2023). dplyr: A Grammar of Data Manipulation. R package version 1.1.3, https://CRAN.R-project.org/package=dplyr.

tidyverse:

Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). “Welcome to the tidyverse.” Journal of Open Source Software, 4(43), 1686. doi:10.21105/joss.01686 https://doi.org/10.21105/joss.01686.

ggpubr:

Kassambara A (2023). ggpubr: 'ggplot2' Based Publication Ready Plots. R package version 0.6.0, https://CRAN.R-project.org/package=ggpubr.

ape:

Paradis E, Schliep K (2019). “ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R.” Bioinformatics, 35, 526-528. doi:10.1093/bioinformatics/bty633 https://doi.org/10.1093/bioinformatics/bty633.

splitstackshape:

Mahto A (2019). splitstackshape: Stack and Reshape Datasets After Splitting Concatenated Values. R package version 1.4.8, https://CRAN.R-project.org/package=splitstackshape.