Comparative Genomics - sellwe/Genome-Analysis GitHub Wiki
In order to do comparative genomics and evaluate synteny i compared my Canu assembled genome of E.faecium strain E745 against the following reference genome from E.faecium strain NCTC7171 using MUMmerplot. This will also serve as a step in the assembly evaluation.
https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_900447735.1/
As the reference genome stems from the same species I suspect that the overall syntenty will be conserved, given that the assembly was successful.
MUMerplot:
x-axis is the reference genome, y-axis is my assembly.
The red diagonals indicate good, long alignments. The blue diagonals indicate inversions. Im not sure how far removed strain E745 is to strain NCTC7171 , but overall the alignment is very good. We can follow a red long diagonal in the big part of the graph, which represent alignment of the chromosomes between the chromosome (tig 1) and the longest read in the reference. We see a break in the diagonal where we likely have inversions in the chromosome, perhaps due to transposable elements. Except for this inversion the two genomes indicate very high synteny in their main genomes. There are a few scattered dots which might indicate some fragmentation, mismatches or low quality alignments. The plot indicates that the assembly of the chromosome was successful as it aligns very well to the reference genome of the
In the upper left part of the graph we see the alignments of the suspected plasmids (much shorter). Its hard to tell if these are just scattered dots or small proper diagonals, so i cant really tell from this how well the the small contigs aligned.
If we look back to the tigInfo file from the assembly:
It indicates that only contig 5, 7 and 8 are predicted as circular. Having non-circular might make the assembly have different starting points to the reference genome (not dnaA as the first gene). This might explain the second break in the long red diagonal (not the inversion). For my original project plan i wanted to use Artemis ACT to do a deeper synteny evaluation, here it would have been interesting to see the chromosomal inversions. I tried to use the circulator software to make my contigs circular for this analysis, but it did not work. I also tried to use Artemis anyway, but the software has a lot of issues with different Java versions. Canu 2.2 is known to struggle with predicting circular genomes. So as a last resort i tried to redo my assembly using Canu 2.0, but that did unfortunately not work either.
But since the alignment still looks very good in the MUMmerplot, even if it might have different starting points, and my downstream analyses will not be affected by it, i decided that this was enough for drawing conclusions about the synteny and quality of the assembly.