V. Running the ISSR seq data processing pipeline - barrettlab/2021-Genomics-bootcamp GitHub Wiki

ISSR-seq Data processing pipeline COMMANDS

Link to the official ISSR-seq GitHub wiki and Repository

Before you start:


# replace 'cbarrett_practice' with your data directory name

cd /data/2021_bootcamp/cbarrett_practice/


# make a reads directory, copy reads there, and make read/write/executable

mkdir reads
cp /data/2021_bootcamp/reads_practice/* reads # if access denied, try 'sudo' before copying
sudo chmod 777 -R reads


# copy the files with the ISSR primers/adapters, negative plastome reference, and the 'samples' file

cp /data/2021_bootcamp/i5_i7_ISSR_primers.fasta .
cp /data/2021_bootcamp/striata_plastomes.fasta .
cp /data/2021_bootcamp/samples.txt .

ISSR-seq pipeline

1. Now, from your data directory, run the 'Assemble_Reference' script

This script trims your reads with bbduk and assembles a pseudo-reference with SPADes

For larger jobs, use nohup before commands and '&' after to run in background

ISSRseq_AssembleReference.sh  -O craig_practice_issrseq -I /data/2021_bootcamp/cbarrett_practice/reads -S samples.txt -R 103a_NM -T 4 -M 50 -H 0 -P i5_i7_ISSR_primers.fasta -K 81 -L 100 -N striata_plastomes.fasta -X 8

# for larger jobs with more data:
nohup ISSRseq_AssembleReference.sh  -O craig_practice_issrseq -I /data/2021_bootcamp/cbarrett_practice/reads -S samples.txt -R 103a_NM -T 4 -M 50 -H 0 -P i5_i7_ISSR_primers.fasta -K 81 -L 100 -N striata_plastomes.fasta -X 8 &

2. Create BAM files by mapping to reference with bbmap

  • The previous script will automatically create a directory with the output name you supplied and a timestamp.

  • Use this for the remaining commands.

ISSRseq_CreateBAMs.sh -T 4 -O craig_practice_issrseq_2021_06_17_18_28

3. Call SNPS with GATK

ISSRseq_AnalyzeBAMs.sh -T 4 -P 2 -O craig_practice_issrseq_2021_06_17_18_28

4. Create matrices in phylip, nexus, and fasta formats

# 'S' = # samples in which a SNP must be present. If you have 96 samples, S = 96 is the most stringent occupancy filtering. Try multiple values, from 1-96 (if you have 96 samples, adjust if fewer or more).

ISSRseq_CreateMatrices.sh -T 10 -S 8 -O craig_practice_issrseq_2021_06_17_18_28