Spike in based quantified normalization - PEHGP/ssDripPipeline GitHub Wiki

E. coli genome was used for spike-in quantitative analysis.

1.Reference Genome Preparing

The E. coli genome is used as a chromosome of the reference genome to participate in the alignment.

cat RefGenome.fasta EcoliGenome.fasta >RefGenome_EcoliGenome.fasta
bowtie2-build --threads 10 RefGenome_EcoliGenome.fasta RefGenome_EcoliGenome

Adapter Cutting and Tail Trimming, Alignment, Duplicates Removing, Strand Splitting, Peak Calling are the same as the normal protocols.

2.Spike-in Scale Factor Calculating

#E. coli total reads, BW25113 is the E. coli chromosome name, please assign this to fit needs.
samtools idxstats test.sort.paird_dup.bam|grep 'BW25113'|awk '{print $3}'  
#ref genome total reads, Chr is the prefix of the reference genome chromosome name, please assign this to fit needs.
samtools idxstats test.sort.paird_dup.bam|grep -P 'Chr\d'|awk '{sum+=$3}END{print sum}' 

Spike-in Scale Factor=(drip_ref_total_reads/input_ref_total_reads)/(drip_ecoli_total_reads/input_ecoli_total_reads)

3.Bam to BigWig

The $scale_factor is the spike-in scale factor.
These parameters --effectiveGenomeSize, --ignoreForNormalization, --scaleFactor need to be adjusted to fit your own data.

bamCoverage –p 10 -b test_fwd.bam -o test_fwd_norm.bw --binSize 100 --effectiveGenomeSize 119300826  --normalizeUsing RPGC --ignoreForNormalization ChrM ChrC --scaleFactor $scale_factor
bamCoverage –p 10 -b test_rev.bam -o test_rev_norm.bw --binSize 100 --effectiveGenomeSize 119300826  --normalizeUsing RPGC --ignoreForNormalization ChrM ChrC --scaleFactor $scale_factor

4.Peak Difference Analysis Using DESeq2

4.1 Peak merging of all samples

The peaks that need to remove E. coli in merge.bed

cat *_peaks.bed|bedtools sort -i -|bedtools merge -i -|bedtools sort -i - >merge.bed

4.2 DESeq2 Scale Factor Calculating

multiBamSummary BED-file -p 10 --BED merge.bed --bamfiles *.sort.paird_dup.bam -o results.npz --outRawCounts results.txt --scalingFactors deseq_scale.txt

4.3 DESeq2 using custom scale factor

The DEseq2 scale factor from step 4.2 of each sample is multiplied by the spike-in scale factor from step 2 of each sample. Then, divide these results by 1 to get the final custom scale factor.

dds <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design=~condition)
sizefactor <- c(2,2,3,3) #custom scale factor
names(sizefactor) <- c("A_1","A_2","B_1","B_2") #sample name
sizeFactors(dds) <- sizefactor
dds <- DESeq(dds)