Step 1: Create QIIME2 manifest file - shenjean/diversity GitHub Wiki
QIIME2 requires absolute file paths for its manifest file and will throw an error if the file paths are not absolute.
Activate QIIME2 on the CIRCE server
ssh [email protected]
module load apps/qiime2/2019.10
source activate qiime2-2019.10
Check if QIIME2 has been loaded correctly
qiime --help
Manifest file format
The manifest file looks like this:
sample-id forward-absolute-filepath reverse-absolute-filepath
BCB11H $PWD/BCB11H_S173_L001_R1_001.fastq.gz $PWD/BCB11H_S173_L001_R2_001.fastq.gz
BCB11T $PWD/BCB11T_S175_L001_R1_001.fastq.gz $PWD/BCB11T_S175_L001_R2_001.fastq.gz
BCB16HL $PWD/BCB16HL_S182_L001_R1_001.fastq.gz $PWD/BCB16HL_S182_L001_R2_001.fastq.gz
BCB16SL $PWD/BCB16SL_S180_L001_R1_001.fastq.gz $PWD/BCB16SL_S180_L001_R2_001.fastq.gz
BCB16TL $PWD/BCB16TL_S181_L001_R1_001.fastq.gz $PWD/BCB16TL_S181_L001_R2_001.fastq.gz
BCB17HL $PWD/BCB17HL_S221_L001_R1_001.fastq.gz $PWD/BCB17HL_S221_L001_R2_001.fastq.gz
BCB17SL $PWD/BCB17SL_S184_L001_R1_001.fastq.gz $PWD/BCB17SL_S184_L001_R2_001.fastq.gz
BCB17SR $PWD/BCB17SR_S183_L001_R1_001.fastq.gz $PWD/BCB17SR_S183_L001_R2_001.fastq.gz
BCB17TL $PWD/BCB17TL_S187_L001_R1_001.fastq.gz $PWD/BCB17TL_S187_L001_R2_001.fastq.gz
BCB17TR $PWD/BCB17TR_S186_L001_R1_001.fastq.gz $PWD/BCB17TR_S186_L001_R2_001.fastq.gz
BCB9HR $PWD/BCB9HR_S167_L001_R1_001.fastq.gz $PWD/BCB9HR_S167_L001_R2_001.fastq.gz
BCB9TL $PWD/BCB9TL_S169_L001_R1_001.fastq.gz $PWD/BCB9TL_S169_L001_R2_001.fastq.gz
Generating the manifest file
First, go to the BocaCiegaBay folder by typing cd BocaCiegaBay
.
Then, extract the sampleIDs from file names ending with "R1_001.fastq.gz" using the cut
command. Here, the delimiter is set to "_" and we extract the first column (e.g. BCB16HL from file: BCB16HL_S182_L001_R1_001.fastq.gz)
ls *R1_001.fastq.gz | cut -d "_" -f1 >SampleID
Extract list of files with file names ending with "R1_001.fastq.gz", then replace the beginning of each file name (^) with "$PWD/" to specify the file path. Note: "$" and "/" are special characters that need to be escaped using the reverse slash "".
ls *R1_001.fastq.gz | sed "s/^/\$PWD\//" >R1_fixed
Repeat with file names ending with "R2_001.fastq.gz"
ls *R2_001.fastq.gz | sed "s/^/\$PWD\//" >R2_fixed
Remove any intermediate files, if necessary, using the rm
command. For example: rm R1 R2
Check whether the output files, sampleID, R1_fixed, and R2_fixed have the same number of lines:
wc -l sampleID R1_fixed R2_fixed
Output should look like this:
12 sampleID
12 R1_fixed
12 R2_fixed
36 total
Check contents of files, sampleID, R1_fixed, and R2_fixed using the more
command:
more sampleID
Output should look like this:
BCB11H
BCB11T
BCB16HL
BCB16SL
BCB16TL
BCB17HL
BCB17SL
BCB17SR
BCB17TL
BCB17TR
BCB9HR
BCB9TL
View the contents of R1_fixed
more R1_fixed
Output:
$PWD/BCB11H_S173_L001_R1_001.fastq.gz
$PWD/BCB11T_S175_L001_R1_001.fastq.gz
$PWD/BCB16HL_S182_L001_R1_001.fastq.gz
$PWD/BCB16SL_S180_L001_R1_001.fastq.gz
$PWD/BCB16TL_S181_L001_R1_001.fastq.gz
$PWD/BCB17HL_S221_L001_R1_001.fastq.gz
$PWD/BCB17SL_S184_L001_R1_001.fastq.gz
$PWD/BCB17SR_S183_L001_R1_001.fastq.gz
$PWD/BCB17TL_S187_L001_R1_001.fastq.gz
$PWD/BCB17TR_S186_L001_R1_001.fastq.gz
$PWD/BCB9HR_S167_L001_R1_001.fastq.gz
$PWD/BCB9TL_S169_L001_R1_001.fastq.gz
more R2_fixed
Output:
$PWD/BCB11H_S173_L001_R2_001.fastq.gz
$PWD/BCB11T_S175_L001_R2_001.fastq.gz
$PWD/BCB16HL_S182_L001_R2_001.fastq.gz
$PWD/BCB16SL_S180_L001_R2_001.fastq.gz
$PWD/BCB16TL_S181_L001_R2_001.fastq.gz
$PWD/BCB17HL_S221_L001_R2_001.fastq.gz
$PWD/BCB17SL_S184_L001_R2_001.fastq.gz
$PWD/BCB17SR_S183_L001_R2_001.fastq.gz
$PWD/BCB17TL_S187_L001_R2_001.fastq.gz
$PWD/BCB17TR_S186_L001_R2_001.fastq.gz
$PWD/BCB9HR_S167_L001_R2_001.fastq.gz
$PWD/BCB9TL_S169_L001_R2_001.fastq.gz
Combine SampleID, R1_fixed, R2_fixed using the paste command:
paste -d "\t" SampleID R1_fixed R2_fixed >manifest
Edit the manifest file using nano manifest
and add the following headers separated by the Tab key. Check for spelling/typo errors:
sample-id forward-absolute-filepath reverse-absolute-filepath
To save your file in nano, type Ctrl+O. To exit nano, type Ctrl+X.
Check whether each column in the manifest file is formatted correctly. View the contents of column 1:
cut -f1 manifest
Output should be:
sample-id
BCB11H
BCB11T
BCB16HL
BCB16SL
BCB16TL
BCB17HL
BCB17SL
BCB17SR
BCB17TL
BCB17TR
BCB9HR
BCB9TL
View the contents of column 2 in the manifest file:
cut -f2 manifest
Expected output:
forward-absolute-filepath
$PWD/BCB11H_S173_L001_R1_001.fastq.gz
$PWD/BCB11T_S175_L001_R1_001.fastq.gz
$PWD/BCB16HL_S182_L001_R1_001.fastq.gz
$PWD/BCB16SL_S180_L001_R1_001.fastq.gz
$PWD/BCB16TL_S181_L001_R1_001.fastq.gz
$PWD/BCB17HL_S221_L001_R1_001.fastq.gz
$PWD/BCB17SL_S184_L001_R1_001.fastq.gz
$PWD/BCB17SR_S183_L001_R1_001.fastq.gz
$PWD/BCB17TL_S187_L001_R1_001.fastq.gz
$PWD/BCB17TR_S186_L001_R1_001.fastq.gz
$PWD/BCB9HR_S167_L001_R1_001.fastq.gz
$PWD/BCB9TL_S169_L001_R1_001.fastq.gz
View the contents of column 3 in the manifest file:
cut -f3 manifest
Expected output:
reverse-absolute-filepath
$PWD/BCB11H_S173_L001_R2_001.fastq.gz
$PWD/BCB11T_S175_L001_R2_001.fastq.gz
$PWD/BCB16HL_S182_L001_R2_001.fastq.gz
$PWD/BCB16SL_S180_L001_R2_001.fastq.gz
$PWD/BCB16TL_S181_L001_R2_001.fastq.gz
$PWD/BCB17HL_S221_L001_R2_001.fastq.gz
$PWD/BCB17SL_S184_L001_R2_001.fastq.gz
$PWD/BCB17SR_S183_L001_R2_001.fastq.gz
$PWD/BCB17TL_S187_L001_R2_001.fastq.gz
$PWD/BCB17TR_S186_L001_R2_001.fastq.gz
$PWD/BCB9HR_S167_L001_R2_001.fastq.gz
$PWD/BCB9TL_S169_L001_R2_001.fastq.gz