RNA‐Mapping - sellwe/Genome-Analysis GitHub Wiki
I ran BWA on the raw "pre-trimmed" RNA data and my Canu assembly in order to retain as many reads as possible as there was no quality benefits in trimming.
BWA will map the forward and reverse reads for each sample to the assembly and produce .bam and .bam.bai files. The .bam files will together with the .gff file from Prokka be used for read counts in the next step.
RNA-seq Mapping statistics
I ran samtools stats and samtools coverage on the bam files to look at the quality (screenshots found below) and compiled a quality table:
Sample | Total Reads | Mapped Reads | % Mapped | % Coverage | Mean Depth | Avg Q-Score | % Paired Reads | Error Rate |
---|---|---|---|---|---|---|---|---|
Serum 69 | 26,178,126 | 25,670,893 | 98.20% | 95.47% | 199.91X | 37.0 | 96.0% | 0.492% |
Serum 70 | 29,385,498 | 28,887,560 | 98.94% | 95.51% | 194.98X | 37.1 | 96.2% | 0.472% |
Serum 71 | 27,477,320 | 27,583,077 | 98.94% | 95.17% | 173.56X | 37.1 | 96.3% | 0.480% |
BH 72 | 27,513,752 | 27,116,979 | 98.56% | 94.92% | 180.77X | 37.1 | 96.7% | 0.467% |
BH 73 | 27,392,308 | 27,018,957 | 98.63% | 95.65% | 202.04X | 37.1 | 96.8% | 0.450% |
BH 74 | 24,827,942 | 24,482,454 | 98.61% | 95.81% | 202.08X | 37.1 | 96.8% | 0.469% |
Overall the mapping was very succesful. Nearly all of the reads were mapped in the 6 replicates, on average 98.65% of all RNA-reads were mapped to the genome. ~95% of the genes were covered across the contigs. The 5% of genes that were not mapped could be due to genes with extremely low expression numbers, or individual specific genes that were not present in my Canu reference assembly. The average depth was also around 175-202X which should be able to capture genes with low expression as well. The coverage and depth was higher for the chromosomal contig than the contigs representing the plasmids, which drug down the averages.
The error rates were consistently low so there are no misalignments. The average Q-scores (Phred-scores) are consistently very high is almost the same for all replicates (37), indicating very low probability of the base calling being wrong.
BH 72:
% Reads mapped: 27116979 / (27116979 + 396773) * 100 = 98.56%
BH 73:
% Reads mapped: 27018957 / (27018957 + 376351) * 100 = 98.63%
BH 74:
% Reads mapped: 24482454 / (24482454 + 345488) * 100 = 98.61%
Serum 69:
% Reads mapped: 25670893 / (25670893 + 470233) * 100 = 98.20%
Serum 70:
% Reads mapped: 28887560 / (28887560 + 497938) * 100 = 98.94%
Serum 71:
% Reads mapped: 27583077 / (27583077 + 294243) * 100 = 98.94%