import demultiplexed fastq files - statonlab/BiGG2020_CrackNAg GitHub Wiki

This analysis is stored in /staton/projects/BiGG_CrackNAg/analysis_June2020

To import the demultiplexed samples into qiime pipeline, we first need to create a manifest.tsv file. The format of this file should be three columns with sample id, R1 file path, R2 file path, separated by tab.

manifest file

First, let's generate the column names following the importing data tutorial

cd /staton/projects/BiGG_CrackNAg/analysis_June2020 # please edit the path to work in your own folder
echo -e "sample-id\tforward-absolute-filepath\treverse-absolute-filepath" > manifest.tsv
# -e here means to 'enable interpretation of backslash escapes' so we can output a tab in between

Next, let's use a script to output R1 and R2 file paths into the manifest.

for f1 in ../raw_data/*R1* # create a for loop to access all the R1 files
do
	f2=$(echo $f1 | sed 's/R1/R2/g') # replace R1 with R2
	sample=$( basename $f1 | sed 's/_R1_001.fastq.gz//g') # remove the text '_R1_001.fastq.gz', only extract sample name
	echo -e "$sample\t$PWD/$f1\t$PWD/$f2" # print out sample name, R1 file and R2 file, separate by tab
done

I created a make_manifest.sh script file to store the script. Then run:

bash make_manifest.sh >> manifest.tsv

Now, we have a complete manifest.tsv files in the correct format.

qiime import data

# load qiime environment
conda activate qiime2-2020.2
# run the import command
qiime tools import \
	--type 'SampleData[PairedEndSequencesWithQuality]' \
	--input-path manifest.tsv \
	--output-path paired-end-demux.qza \
	--input-format PairedEndFastqManifestPhred33V2

The input-format, you want to put the correct quality score format. I double checked the recent illumina sequencing data all use phred33. So that's why we put 'PairedEndFastqManifestPhred33V2' in the input-format.

Now, it generates paired-end-demux.qza successfully!