Step 1: Create QIIME2 manifest file - shenjean/diversity GitHub Wiki

QIIME2 requires absolute file paths for its manifest file and will throw an error if the file paths are not absolute.

Activate QIIME2 on the CIRCE server

ssh [email protected]
module load apps/qiime2/2019.10
source activate qiime2-2019.10

Check if QIIME2 has been loaded correctly

qiime --help

Manifest file format

The manifest file looks like this:

sample-id	forward-absolute-filepath	reverse-absolute-filepath
BCB11H	$PWD/BCB11H_S173_L001_R1_001.fastq.gz	$PWD/BCB11H_S173_L001_R2_001.fastq.gz
BCB11T	$PWD/BCB11T_S175_L001_R1_001.fastq.gz	$PWD/BCB11T_S175_L001_R2_001.fastq.gz
BCB16HL	$PWD/BCB16HL_S182_L001_R1_001.fastq.gz	$PWD/BCB16HL_S182_L001_R2_001.fastq.gz
BCB16SL	$PWD/BCB16SL_S180_L001_R1_001.fastq.gz	$PWD/BCB16SL_S180_L001_R2_001.fastq.gz
BCB16TL	$PWD/BCB16TL_S181_L001_R1_001.fastq.gz	$PWD/BCB16TL_S181_L001_R2_001.fastq.gz
BCB17HL	$PWD/BCB17HL_S221_L001_R1_001.fastq.gz	$PWD/BCB17HL_S221_L001_R2_001.fastq.gz
BCB17SL	$PWD/BCB17SL_S184_L001_R1_001.fastq.gz	$PWD/BCB17SL_S184_L001_R2_001.fastq.gz
BCB17SR	$PWD/BCB17SR_S183_L001_R1_001.fastq.gz	$PWD/BCB17SR_S183_L001_R2_001.fastq.gz
BCB17TL	$PWD/BCB17TL_S187_L001_R1_001.fastq.gz	$PWD/BCB17TL_S187_L001_R2_001.fastq.gz
BCB17TR	$PWD/BCB17TR_S186_L001_R1_001.fastq.gz	$PWD/BCB17TR_S186_L001_R2_001.fastq.gz
BCB9HR	$PWD/BCB9HR_S167_L001_R1_001.fastq.gz	$PWD/BCB9HR_S167_L001_R2_001.fastq.gz
BCB9TL	$PWD/BCB9TL_S169_L001_R1_001.fastq.gz	$PWD/BCB9TL_S169_L001_R2_001.fastq.gz

Generating the manifest file

First, go to the BocaCiegaBay folder by typing cd BocaCiegaBay.

Then, extract the sampleIDs from file names ending with "R1_001.fastq.gz" using the cut command. Here, the delimiter is set to "_" and we extract the first column (e.g. BCB16HL from file: BCB16HL_S182_L001_R1_001.fastq.gz)

ls *R1_001.fastq.gz | cut -d "_" -f1 >SampleID

Extract list of files with file names ending with "R1_001.fastq.gz", then replace the beginning of each file name (^) with "$PWD/" to specify the file path. Note: "$" and "/" are special characters that need to be escaped using the reverse slash "".

ls *R1_001.fastq.gz | sed "s/^/\$PWD\//" >R1_fixed

Repeat with file names ending with "R2_001.fastq.gz"

ls *R2_001.fastq.gz | sed "s/^/\$PWD\//" >R2_fixed

Remove any intermediate files, if necessary, using the rm command. For example: rm R1 R2

Check whether the output files, sampleID, R1_fixed, and R2_fixed have the same number of lines:

wc -l sampleID R1_fixed R2_fixed

Output should look like this:

12 sampleID
  12 R1_fixed
  12 R2_fixed
  36 total

Check contents of files, sampleID, R1_fixed, and R2_fixed using the more command:

more sampleID

Output should look like this:

BCB11H
BCB11T
BCB16HL
BCB16SL
BCB16TL
BCB17HL
BCB17SL
BCB17SR
BCB17TL
BCB17TR
BCB9HR
BCB9TL

View the contents of R1_fixed

more R1_fixed

Output:

$PWD/BCB11H_S173_L001_R1_001.fastq.gz
$PWD/BCB11T_S175_L001_R1_001.fastq.gz
$PWD/BCB16HL_S182_L001_R1_001.fastq.gz
$PWD/BCB16SL_S180_L001_R1_001.fastq.gz
$PWD/BCB16TL_S181_L001_R1_001.fastq.gz
$PWD/BCB17HL_S221_L001_R1_001.fastq.gz
$PWD/BCB17SL_S184_L001_R1_001.fastq.gz
$PWD/BCB17SR_S183_L001_R1_001.fastq.gz
$PWD/BCB17TL_S187_L001_R1_001.fastq.gz
$PWD/BCB17TR_S186_L001_R1_001.fastq.gz
$PWD/BCB9HR_S167_L001_R1_001.fastq.gz
$PWD/BCB9TL_S169_L001_R1_001.fastq.gz
more R2_fixed

Output:

$PWD/BCB11H_S173_L001_R2_001.fastq.gz
$PWD/BCB11T_S175_L001_R2_001.fastq.gz
$PWD/BCB16HL_S182_L001_R2_001.fastq.gz
$PWD/BCB16SL_S180_L001_R2_001.fastq.gz
$PWD/BCB16TL_S181_L001_R2_001.fastq.gz
$PWD/BCB17HL_S221_L001_R2_001.fastq.gz
$PWD/BCB17SL_S184_L001_R2_001.fastq.gz
$PWD/BCB17SR_S183_L001_R2_001.fastq.gz
$PWD/BCB17TL_S187_L001_R2_001.fastq.gz
$PWD/BCB17TR_S186_L001_R2_001.fastq.gz
$PWD/BCB9HR_S167_L001_R2_001.fastq.gz
$PWD/BCB9TL_S169_L001_R2_001.fastq.gz

Combine SampleID, R1_fixed, R2_fixed using the paste command:

paste -d "\t" SampleID R1_fixed R2_fixed >manifest

Edit the manifest file using nano manifest and add the following headers separated by the Tab key. Check for spelling/typo errors:

sample-id	forward-absolute-filepath	reverse-absolute-filepath

To save your file in nano, type Ctrl+O. To exit nano, type Ctrl+X.

Check whether each column in the manifest file is formatted correctly. View the contents of column 1:

cut -f1 manifest

Output should be:

sample-id
BCB11H
BCB11T
BCB16HL
BCB16SL
BCB16TL
BCB17HL
BCB17SL
BCB17SR
BCB17TL
BCB17TR
BCB9HR
BCB9TL

View the contents of column 2 in the manifest file:

cut -f2 manifest

Expected output:

forward-absolute-filepath
$PWD/BCB11H_S173_L001_R1_001.fastq.gz
$PWD/BCB11T_S175_L001_R1_001.fastq.gz
$PWD/BCB16HL_S182_L001_R1_001.fastq.gz
$PWD/BCB16SL_S180_L001_R1_001.fastq.gz
$PWD/BCB16TL_S181_L001_R1_001.fastq.gz
$PWD/BCB17HL_S221_L001_R1_001.fastq.gz
$PWD/BCB17SL_S184_L001_R1_001.fastq.gz
$PWD/BCB17SR_S183_L001_R1_001.fastq.gz
$PWD/BCB17TL_S187_L001_R1_001.fastq.gz
$PWD/BCB17TR_S186_L001_R1_001.fastq.gz
$PWD/BCB9HR_S167_L001_R1_001.fastq.gz
$PWD/BCB9TL_S169_L001_R1_001.fastq.gz

View the contents of column 3 in the manifest file:

cut -f3 manifest

Expected output:

reverse-absolute-filepath
$PWD/BCB11H_S173_L001_R2_001.fastq.gz
$PWD/BCB11T_S175_L001_R2_001.fastq.gz
$PWD/BCB16HL_S182_L001_R2_001.fastq.gz
$PWD/BCB16SL_S180_L001_R2_001.fastq.gz
$PWD/BCB16TL_S181_L001_R2_001.fastq.gz
$PWD/BCB17HL_S221_L001_R2_001.fastq.gz
$PWD/BCB17SL_S184_L001_R2_001.fastq.gz
$PWD/BCB17SR_S183_L001_R2_001.fastq.gz
$PWD/BCB17TL_S187_L001_R2_001.fastq.gz
$PWD/BCB17TR_S186_L001_R2_001.fastq.gz
$PWD/BCB9HR_S167_L001_R2_001.fastq.gz
$PWD/BCB9TL_S169_L001_R2_001.fastq.gz