RNA‐Mapping - sellwe/Genome-Analysis GitHub Wiki

I ran BWA on the raw "pre-trimmed" RNA data and my Canu assembly in order to retain as many reads as possible as there was no quality benefits in trimming.

BWA will map the forward and reverse reads for each sample to the assembly and produce .bam and .bam.bai files. The .bam files will together with the .gff file from Prokka be used for read counts in the next step.

RNA-seq Mapping statistics

I ran samtools stats and samtools coverage on the bam files to look at the quality (screenshots found below) and compiled a quality table:

Sample	Total Reads	Mapped Reads	% Mapped	% Coverage	Mean Depth	Avg Q-Score	% Paired Reads	Error Rate
Serum 69	26,178,126	25,670,893	98.20%	95.47%	199.91X	37.0	96.0%	0.492%
Serum 70	29,385,498	28,887,560	98.94%	95.51%	194.98X	37.1	96.2%	0.472%
Serum 71	27,477,320	27,583,077	98.94%	95.17%	173.56X	37.1	96.3%	0.480%
BH 72	27,513,752	27,116,979	98.56%	94.92%	180.77X	37.1	96.7%	0.467%
BH 73	27,392,308	27,018,957	98.63%	95.65%	202.04X	37.1	96.8%	0.450%
BH 74	24,827,942	24,482,454	98.61%	95.81%	202.08X	37.1	96.8%	0.469%

Overall the mapping was very succesful. Nearly all of the reads were mapped in the 6 replicates, on average 98.65% of all RNA-reads were mapped to the genome. ~95% of the genes were covered across the contigs. The 5% of genes that were not mapped could be due to genes with extremely low expression numbers, or individual specific genes that were not present in my Canu reference assembly. The average depth was also around 175-202X which should be able to capture genes with low expression as well. The coverage and depth was higher for the chromosomal contig than the contigs representing the plasmids, which drug down the averages.

The error rates were consistently low so there are no misalignments. The average Q-scores (Phred-scores) are consistently very high is almost the same for all replicates (37), indicating very low probability of the base calling being wrong.

BH 72: