Home ‐ Microbiome Helper 2 - LangilleLab/microbiome_helper GitHub Wiki
We have been continually been updating our SOPs and workflows, and as they are now substantially different than they were at the time of publication and release of Microbiome Helper (2017), we will be working on updating our scripts, SOPs, workflows and tutorials for a second release of Microbiome Helper (i.e., Microbiome Helper 2). The pages that we have linked from this page should be fully functional, but please do be aware that we are still updating them so take note of the commands that you use if you use these pages. Please let us know if you run into any issues.
The idea is that there will be a lot of different workflows for analyses that we often carry out. These cover both marker gene (e.g. 16S, 18S, ITS as well as others) and metagenomic sequencing. Some parts of this are "core", but some parts will depend on what you want to get from your data and what your questions are. This is not a comprehensive list of all of the options that are out there!! These are some tools that we use the most, but there are many others out there - we encourage you to do your own research and decide which is the best for your purposes.
We also plan to make available both an "image" that has all current versions of programs pre-installed and is more-or-less ready to use, as well as instructions on setting up a computing environment for yourself.
General notes on analysis
Things you need before starting:
- A computer/server to do your analyses:
- See an overview of the computational resources required
- See information on setting up an AWS instance for your analyses
- See the instructions for setting up environments for analyses/installation of required programs
- A basic understanding of how to use the server/computer and run programs from the command line:
- Brief introduction to linux servers and command line/server basics like using
tmuxorscreen, andscpto copy between the server and local
- Brief introduction to linux servers and command line/server basics like using
If you already have these things, feel free to proceed. If you have no idea what most of that meant, see our beginner microbiome analysis page for more detailed information on all of these things.
If you do not yet have your own data but would like to learn these methods:
- Download tutorial data including 16S short-read, 16S long-read, Metagenome short-read, Metagenome long-read
Amplicon/marker gene workflow
- Marker gene workflow in QIIME2
- Functional prediction with PICRUSt2
- Basic statistics and visualisation in QIIME2
Metagenomics workflows
The workflows used for metagenomics are a lot less standardised than for amplicon, and will depend a lot more on your data characteristics (sequencing depth, environment sampled, host contamination, etc.) as well as what you are interested in (e.g. antimicrobial resistance vs an overview of the taxa in your microbial community). You can see how all of these fit together in the diagram at the top of the page, but our standard analysis pipeline typically involves:
- Initial pre-processing and QC
- Taxonomic annotation of reads with Kraken2 using our RefSeq Complete database (see our paper for more on choosing a database)
- Functional annotation of reads with MMSeqs2 and the UniRef90 database
- Linking of taxonomic and functional annotations for each read
Read-based and assembly-based workflows
- Initial pre-processing and QC (note that these steps need to be carried out regardless of the other analyses that you wish to carry out)
Read-based workflows
- Metagenomic taxonomic annotation:
- Metagenomic functional annotation:
Assembly-based workflows
- MAG assembly, binning, and curation with Anvi'o
Note that once you have constructed contigs, you can apply many of the read-based methods to these for taxonomic/functional annotation, but be aware that not all reads from your initial samples will be able to be assembled into contigs
More advanced/custom analyses
Downstream/statistical analysis workflow
- Importing data:
- 16S
- Metagenomics taxonomic data
- Functional data (predictions or metagenomic annotations)
- Stratified data
- Overview of samples:
- Basic stacked bar plots
- Heatmaps
- Phylogenetic trees
- Including fundamentals of adding plots together? (e.g. ordering by taxa so that a heatmap will appear in the same order as the tree and can be linked?)
- Alpha/beta diversity:
- Alpha diversity metrics and visualisation
- Beta diversity metrics and visualisation - ordination vs others e.g. heatmap or dendrogram
- Look at taxonomic contributions to function
- Differential abundance:
- ANCOM
- ALDEx
- MaAsLin
- Corncob
- radEmu
- Longitudinal analysis
- Combining taxonomic/functional visualisations with JarrVis
Other planned additions to this page
- Taxonomic profiling for metagenomic reads with Sylph
- Taxonomic annotation of long reads
Other useful things
- Downloading reads from SRA
- Uploading reads to SRA (or ENA)