diversity.sh - juanravm/MicroSeqProfiler GitHub Wiki

Microbiome alpha and beta diversity metrics analysis (diversity.sh)

Once all the quality control process is completed, this script carries out an alpha and beta diversity metric calculation using QIIME2 software. This software performs a phylogenetic tree construction by RAxML calculation prior to diversity analysis because some of the metrics are based on phylogenetic distance between species. This construction can take some time (from hours to weeks), depending on the number of samples, the number of species, etc. (Using a computational server is recommended). After the tree construction, the diversity analysis is performed, returning .tsv format files with the different diversity metrics calculated for each sample.

You must activate qiime2 enviroment again prior to run this script:

conda activate qiime2

You can run this script as shown below:

bash /file_path/diversity.sh \
--OTU_filtered_seqs /file_path/OTU_filtered_seqs.qza \
--decontam_OTU_table /file_path/decontam_OTU_table.csv \
--metadata_fp /file_path/metadata.tsv \
--taxonomy /file_path/intermediate/taxonomy.qza \
--column Controls \
--pattern Yes \
--sampling 10000 \
--n_trees 5 \
--cores 6

In this code you must specify the next input variables:

  • --OTU_filtered_seqs - OTU_rep_seqs.qza file path with the representative chimera-filtered sequence of each OTU in QIIME2 artifact format
  • --decontam_OTU_table - OTU_table.qza file path with the OTU chimera-filtered an decontaminated counts in .csv
  • --metadata_fp - metadata.tsv file path
  • --taxonomy - taxonomy.qza file path
  • --column - Metadata column to indicate experimental controls
  • --pattern - Character string that indicate controls in metadata column. This column and pattern removes samples that has a "Yes" indicated in metadata column "Controls"
  • --sampling - Minimum sampling depth to calculate diversity analysis. Samples with a sampling depth lower than the indicated will be removed from the analysis. Sampling depth usually change between sequencing sets, so you should establish your minimum sampling depth by checking the "Interactive Sample Detail" from QC-table.qzv file in QIIME2 View (generated previously with importQC.sh script).
  • --n_trees - Number of phylogenetic to calculate before selecting the best one for the diversity analysis. A high number of phylogenetic trees will require more computing time. Default: 5
  • --cores - Number of processor cores available to run this script

This script returns:

  • diversity - A directory called diversity with two subdirectories called "alpha" and "beta" where you can find the alpha and beta diversity metrics respectively. Each subdirectory has another subdirectory called Visualizations where alpha_plot.R and beta_plot.R scripts will work and all the diversity metrics in .tsv and QIIME2 artifact format (.qza)
  • LEfSe - A directory for posterior LEfSe analysis with lefse.R script
  • species.tsv - A file with chimera-filtered and decontaminated taxonomic counts in .tsv format to use in lefse.R script