RNA‐Mapping - sellwe/Genome-Analysis GitHub Wiki

I ran BWA on the raw "pre-trimmed" RNA data and my Canu assembly in order to retain as many reads as possible as there was no quality benefits in trimming.

BWA will map the forward and reverse reads for each sample to the assembly and produce .bam and .bam.bai files. The .bam files will together with the .gff file from Prokka be used for read counts in the next step.

RNA-seq Mapping statistics

I ran samtools stats and samtools coverage on the bam files to look at the quality (screenshots found below) and compiled a quality table:

Sample Total Reads Mapped Reads % Mapped % Coverage Mean Depth Avg Q-Score % Paired Reads Error Rate
Serum 69 26,178,126 25,670,893 98.20% 95.47% 199.91X 37.0 96.0% 0.492%
Serum 70 29,385,498 28,887,560 98.94% 95.51% 194.98X 37.1 96.2% 0.472%
Serum 71 27,477,320 27,583,077 98.94% 95.17% 173.56X 37.1 96.3% 0.480%
BH 72 27,513,752 27,116,979 98.56% 94.92% 180.77X 37.1 96.7% 0.467%
BH 73 27,392,308 27,018,957 98.63% 95.65% 202.04X 37.1 96.8% 0.450%
BH 74 24,827,942 24,482,454 98.61% 95.81% 202.08X 37.1 96.8% 0.469%

Overall the mapping was very succesful. Nearly all of the reads were mapped in the 6 replicates, on average 98.65% of all RNA-reads were mapped to the genome. ~95% of the genes were covered across the contigs. The 5% of genes that were not mapped could be due to genes with extremely low expression numbers, or individual specific genes that were not present in my Canu reference assembly. The average depth was also around 175-202X which should be able to capture genes with low expression as well. The coverage and depth was higher for the chromosomal contig than the contigs representing the plasmids, which drug down the averages.

The error rates were consistently low so there are no misalignments. The average Q-scores (Phred-scores) are consistently very high is almost the same for all replicates (37), indicating very low probability of the base calling being wrong.

BH 72:

image

% Reads mapped: 27116979 / (27116979 + 396773) * 100 = 98.56%

image

BH 73:

image

% Reads mapped: 27018957 / (27018957 + 376351) * 100 = 98.63%

image

BH 74:

image

% Reads mapped: 24482454 / (24482454 + 345488) * 100 = 98.61%

image

Serum 69:

image

% Reads mapped: 25670893 / (25670893 + 470233) * 100 = 98.20%

image

Serum 70:

image

% Reads mapped: 28887560 / (28887560 + 497938) * 100 = 98.94%

image

Serum 71:

image

% Reads mapped: 27583077 / (27583077 + 294243) * 100 = 98.94%

image