CBW 2024 Advanced Module 1: Introduction to metagenomics and read‐based profiling - LangilleLab/microbiome_helper GitHub Wiki

This page will contain a tutorial

Bioinformatic Tool Citations

  • FastQC
  • Kneaddata
  • Bowtie2
  • Kraken2
  • Bracken
  • Kraken-biom
  • MetaPhlAn 3.1

Copy the files to your working directory.

cp -r ~/CourseData/MIC_data/AMB_data/raw_data/ .

First, make your desired output directory (if it doesn't already exist). Then, run FastQC as follows:

fastqc -t 4 raw_data/*fastq.gz -o fastqc_out

Go to http://##.uhn-hpc.ca/ (substituting ## for your student number) and navigate to your FastQC output directory. Click on the html files to view the results for each sample.

Run Kneaddata.

parallel -j 1 --eta --link 'kneaddata -i1 {1} -i2 {2} -o kneaddata_out -db ~/workspace/ben/GRCh38_PhiX --bypass-trim --remove-intermediate-output' ::: raw_data/*R1_subsampled.fastq.gz ::: raw_data/*R2_subsampled.fastq.gz

Concatenate the reads into a single file.

perl ~/CourseData/MIC_data/AMB_data/scripts/concat_paired_end.pl -p 4 --no_R_match -o cat_reads kneaddata_out/*_paired_contam*.fastq

If the above does not work, you may need to install Perl:

conda install conda-forge::perl

If it still does not work or you already have Perl installed, you may get an error saying you require Parallel::ForkManager. Fix by executing the following inside your conda environment:

conda install bioconda::perl-parallel-forkmanager

Check n umber of reads in output:

wc -l cat_reads/*

concatenate the raw data, then unzip it (the ";" lets you enter multiple command lines that will execute in series).

perl ~/CourseData/MIC_data/AMB_data/scripts/concat_paired_end.pl -p 4 -o cat_reads_full raw_data/*.fastq.gz
gunzip cat_reads_full/*.gz

Run Kraken.

parallel -j 2 --eta 'kraken2 --db ~/CourseData/MIC_data/tools/k2_standard_08gb --output kraken2_outraw/{/.}.kraken --report kraken2_kreport/{/.}.kreport' {} ::: cat_reads/*.fastq.gz

Run Bracken.

parallel -j 2 --eta 'bracken -d ~/CourseData/MIC_data/tools/kraken2_standard_08gb -i {} -o bracken_out{/.}.species.bracken -r 100 -l S -t 1' ::: kraken2_kreport/*.kreport

run kraken-biom:

kraken-biom.py kraken2_kreport/*bracken_species.kreport -m mgs_metadata.tsv -o mgs.biom --fmt json