import demultiplexed fastq files - statonlab/BiGG2020_CrackNAg GitHub Wiki
This analysis is stored in /staton/projects/BiGG_CrackNAg/analysis_June2020
To import the demultiplexed samples into qiime pipeline, we first need to create a manifest.tsv
file. The format of this file should be three columns with sample id, R1 file path, R2 file path, separated by tab.
manifest file
First, let's generate the column names following the importing data tutorial
cd /staton/projects/BiGG_CrackNAg/analysis_June2020 # please edit the path to work in your own folder
echo -e "sample-id\tforward-absolute-filepath\treverse-absolute-filepath" > manifest.tsv
# -e here means to 'enable interpretation of backslash escapes' so we can output a tab in between
Next, let's use a script to output R1 and R2 file paths into the manifest.
for f1 in ../raw_data/*R1* # create a for loop to access all the R1 files
do
f2=$(echo $f1 | sed 's/R1/R2/g') # replace R1 with R2
sample=$( basename $f1 | sed 's/_R1_001.fastq.gz//g') # remove the text '_R1_001.fastq.gz', only extract sample name
echo -e "$sample\t$PWD/$f1\t$PWD/$f2" # print out sample name, R1 file and R2 file, separate by tab
done
I created a make_manifest.sh
script file to store the script. Then run:
bash make_manifest.sh >> manifest.tsv
Now, we have a complete manifest.tsv files in the correct format.
qiime import data
# load qiime environment
conda activate qiime2-2020.2
# run the import command
qiime tools import \
--type 'SampleData[PairedEndSequencesWithQuality]' \
--input-path manifest.tsv \
--output-path paired-end-demux.qza \
--input-format PairedEndFastqManifestPhred33V2
The input-format, you want to put the correct quality score format. I double checked the recent illumina sequencing data all use phred33. So that's why we put 'PairedEndFastqManifestPhred33V2' in the input-format.
Now, it generates paired-end-demux.qza successfully!