Microbiome Helper 2 PICRUSt2 - LangilleLab/microbiome_helper GitHub Wiki

Authors: Robyn Wright Modifications by: NA Based on initial versions by: NA

Please note: We think that everything here should work, but we are still testing/developing this so use with caution :)

Introduction

PICRUSt2 (Phylogenetic Investigation of Communities by Reconstruction of Unobserved States) is a tool that predicts the functional capacity of a microbial community based on the taxa that are present in amplicon sequencing data. The first iteration of PICRUSt (Phylogenetic Investigation of Communities by Reconstruction of Unobserved States) was developed by Morgan (and many other collaborators) during his postdoc. This was expanded and improved upon to make PICRUSt2 by Gavin Douglas, and we have recently updated the database used by PICRUSt2 to improve the functional predictions.

There are several other tools that have also been developed for this purpose, e.g., Tax4Fun2, Piphillin and PanFP. I don't personally have experience including these, so we're going to be focusing on PICRUSt2, however, it's always important to understand the strengths and weaknesses of different bioinformatic tools before choosing which one to use for your own research.

You can see full information on PICRUSt2 here, but it includes several key steps (the new database changes these slightly!):

  • Placement of sequences into reference phylogenies for bacteria and archaea
  • Hidden-state prediction to get 16S copy numbers and Nearest Sequenced Taxon Indices (NSTI) per genome
  • Determine the best domain for each sequence (whether they fit best in the bacterial or archaeal reference phylogeny)
  • Hidden-state prediction to get trait abundances (usually KOs or EC numbers) per predicted genome
  • Combine bacterial and archaeal files from hidden-state prediction
  • Predict metagenomes for each sample for each trait
  • Infer MetaCyc pathway abundances and coverages (using EC numbers)

These steps can be run separately, but are all wrapped together into a single command for what we'll be running here.

Requirements

It is assumed that you have already run through the Microbiome Helper 2 Amplicon workflow and have all output files from that analysis.

Running PICRUSt2

We have much more comprehensive documentation on using and running PICRUSt2 here, but if you would just like to run the standard pipeline then you can do so like this:

picrust2_pipeline.py -s exports/dna-sequences.fasta -i exports/feature-table.biom -o picrust2_out -p 1