Spike in based quantified normalization - PEHGP/ssDripPipeline GitHub Wiki
E. coli genome was used for spike-in quantitative analysis.
1.Reference Genome Preparing
The E. coli genome is used as a chromosome of the reference genome to participate in the alignment.
cat RefGenome.fasta EcoliGenome.fasta >RefGenome_EcoliGenome.fasta
bowtie2-build --threads 10 RefGenome_EcoliGenome.fasta RefGenome_EcoliGenome
Adapter Cutting and Tail Trimming, Alignment, Duplicates Removing, Strand Splitting, Peak Calling are the same as the normal protocols.
2.Spike-in Scale Factor Calculating
#E. coli total reads, BW25113 is the E. coli chromosome name, please assign this to fit needs.
samtools idxstats test.sort.paird_dup.bam|grep 'BW25113'|awk '{print $3}'
#ref genome total reads, Chr is the prefix of the reference genome chromosome name, please assign this to fit needs.
samtools idxstats test.sort.paird_dup.bam|grep -P 'Chr\d'|awk '{sum+=$3}END{print sum}'
Spike-in Scale Factor=(drip_ref_total_reads/input_ref_total_reads)/(drip_ecoli_total_reads/input_ecoli_total_reads)
3.Bam to BigWig
The $scale_factor is the spike-in scale factor.
These parameters --effectiveGenomeSize, --ignoreForNormalization, --scaleFactor need to be adjusted to fit your own data.
bamCoverage –p 10 -b test_fwd.bam -o test_fwd_norm.bw --binSize 100 --effectiveGenomeSize 119300826 --normalizeUsing RPGC --ignoreForNormalization ChrM ChrC --scaleFactor $scale_factor
bamCoverage –p 10 -b test_rev.bam -o test_rev_norm.bw --binSize 100 --effectiveGenomeSize 119300826 --normalizeUsing RPGC --ignoreForNormalization ChrM ChrC --scaleFactor $scale_factor
4.Peak Difference Analysis Using DESeq2
4.1 Peak merging of all samples
The peaks that need to remove E. coli in merge.bed
cat *_peaks.bed|bedtools sort -i -|bedtools merge -i -|bedtools sort -i - >merge.bed
4.2 DESeq2 Scale Factor Calculating
multiBamSummary BED-file -p 10 --BED merge.bed --bamfiles *.sort.paird_dup.bam -o results.npz --outRawCounts results.txt --scalingFactors deseq_scale.txt
4.3 DESeq2 using custom scale factor
The DEseq2 scale factor from step 4.2 of each sample is multiplied by the spike-in scale factor from step 2 of each sample. Then, divide these results by 1 to get the final custom scale factor.
dds <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design=~condition)
sizefactor <- c(2,2,3,3) #custom scale factor
names(sizefactor) <- c("A_1","A_2","B_1","B_2") #sample name
sizeFactors(dds) <- sizefactor
dds <- DESeq(dds)