Home ‐ Microbiome Helper 2 - LangilleLab/microbiome_helper GitHub Wiki

We have been continually been updating our SOPs and workflows, and as they are now substantially different than they were at the time of publication and release of Microbiome Helper (2017), we will be working on updating our scripts, SOPs, workflows and tutorials for a second release of Microbiome Helper (i.e., Microbiome Helper 2). The pages that we have linked from this page should be fully functional, but please do be aware that we are still updating them so take note of the commands that you use if you use these pages. Please let us know if you run into any issues.

The idea is that there will be a lot of different workflows for analyses that we often carry out. These cover both marker gene (e.g. 16S, 18S, ITS as well as others) and metagenomic sequencing. Some parts of this are "core", but some parts will depend on what you want to get from your data and what your questions are. This is not a comprehensive list of all of the options that are out there!! These are some tools that we use the most, but there are many others out there - we encourage you to do your own research and decide which is the best for your purposes.

We also plan to make available both an "image" that has all current versions of programs pre-installed and is more-or-less ready to use, as well as instructions on setting up a computing environment for yourself.

General notes on analysis

Things you need before starting:

If you already have these things, feel free to proceed. If you have no idea what most of that meant, see our beginner microbiome analysis page for more detailed information on all of these things.

If you do not yet have your own data but would like to learn these methods:

Amplicon/marker gene workflow

Metagenomics workflows

The workflows used for metagenomics are a lot less standardised than for amplicon, and will depend a lot more on your data characteristics (sequencing depth, environment sampled, host contamination, etc.) as well as what you are interested in (e.g. antimicrobial resistance vs an overview of the taxa in your microbial community). You can see how all of these fit together in the diagram at the top of the page, but our standard analysis pipeline typically involves:

  1. Initial pre-processing and QC
  2. Taxonomic annotation of reads with Kraken2 using our RefSeq Complete database (see our paper for more on choosing a database)
  3. Functional annotation of reads with MMSeqs2 and the UniRef90 database
  4. Linking of taxonomic and functional annotations for each read

Read-based and assembly-based workflows

Read-based workflows

Assembly-based workflows

  • MAG assembly, binning, and curation with Anvi'o

    Note that once you have constructed contigs, you can apply many of the read-based methods to these for taxonomic/functional annotation, but be aware that not all reads from your initial samples will be able to be assembled into contigs

More advanced/custom analyses

Downstream/statistical analysis workflow

Full workflow:

  • Importing data:
    • 16S
    • Metagenomics taxonomic data
    • Functional data (predictions or metagenomic annotations)
    • Stratified data
  • Overview of samples:
    • Basic stacked bar plots
    • Heatmaps
    • Phylogenetic trees
    • Including fundamentals of adding plots together? (e.g. ordering by taxa so that a heatmap will appear in the same order as the tree and can be linked?)
  • Alpha/beta diversity:
    • Alpha diversity metrics and visualisation
    • Beta diversity metrics and visualisation - ordination vs others e.g. heatmap or dendrogram
    • Look at taxonomic contributions to function
  • Differential abundance:
    • ANCOM
    • ALDEx
    • MaAsLin
    • Corncob
    • radEmu
  • Longitudinal analysis
  • Combining taxonomic/functional visualisations with JarrVis

Other planned additions to this page

Other useful things

  • Downloading reads from SRA
  • Uploading reads to SRA (or ENA)