Home - PEHGP/ssDripPipeline GitHub Wiki

step by step protocols.

1.Quality Control

fastqc -t 10 *.fastq.gz

2.Adapter Cutting and Tail Trimming

We cut adapter at the end of the sequence.

trim_galore --phred33 --fastqc --stringency 10 --gzip --length 50 --max_n 10 --clip_R1 10 --clip_R2 10 --three_prime_clip_R1 10 --three_prime_clip_R2 10 --paired test_1.fastq.gz test_2.fastq.gz

3.Alignment

bowtie2-build --threads 10 RefGenome.fasta RefGenome
bowtie2 --local --phred33 -p 10 -t -x RefGenome -1 test_1.fastq.gz -2 test_2.fastq.gz 2>test_align.info|samtools view -bS -1 |samtools sort -@ 10 -m 5G -l 9 -o test.sort.bam

4.Duplicates Removing

java -jar picard.jar MarkDuplicates REMOVE_DUPLICATES=true METRICS_FILE=test.matrix INPUT=test.sort.bam OUTPUT=test.sort.paird_dup.bam

5.Strand Splitting

samtools view -b -f 128 -F 16 test.sort.paird_dup.bam > fwd1.bam
samtools view -b -f 80 test.sort.paird_dup.bam > fwd2.bam
samtools merge -f test_fwd.bam fwd1.bam fwd2.bam
samtools view -b -f 144 test.sort.paird_dup.bam > rev1.bam
samtools view -b -f 64 -F 16 test.sort.paird_dup.bam > rev2.bam
samtools merge -f test_rev.bam rev1.bam rev2.bam

6.Peak Calling

ChrM is mitochondrion sequence name. ChrC is chloroplast sequence name. please assign these to fit needs.

macs2 callpeak -t test_rev.bam -f BAMPE -g 119300826 --keep-dup all -n test_rev
macs2 callpeak -t test_fwd.bam -f BAMPE -g 119300826 --keep-dup all -n test_fwd
awk -F'\t' '$0!~/name$/&&$0!~/^#/&&$0!=""&&$1!~/^ChrM/&&$1!~/^ChrC/{print $1"\t"$2"\t"$3"\t"$NF}' test_fwd_peaks.xls >test_fwd_peaks.bed
awk -F'\t' '$0!~/name$/&&$0!~/^#/&&$0!=""&&$1!~/^ChrM/&&$1!~/^ChrC/{print $1"\t"$2"\t"$3"\t"$NF}' test_rev_peaks.xls >test_rev_peaks.bed

7.Bam to BigWig

These parameters --effectiveGenomeSize, --ignoreForNormalization, need to be adjusted to fit your own data.
Details about effective genome size

samtools index test_fwd.bam
samtools index test_rev.bam
bamCoverage -v -p 10 -b test_fwd.bam -o test_fwd.bw --binSize 100 --effectiveGenomeSize 119300826  --normalizeUsing RPGC --ignoreForNormalization ChrM ChrC
bamCoverage -v -p 10 -b test_rev.bam -o test_rev.bw --binSize 100 --effectiveGenomeSize 119300826  --normalizeUsing RPGC --ignoreForNormalization ChrM ChrC