10 Genome assembly using Spades - saltpinna/Genome_analysis_project GitHub Wiki

As an extra analysis, a second assembler was used to assemble the genome based on Illumina short reads and Nanopore long reads. Spades was used for this, which takes as input both short and long reads and combines them in the assembly. The script used for this can be found under code/Spades_script.sh and the resulting files under results/spades. This assembly was evaluated using the same softwares as for the Canu assembly.

Quast

Quality evaluation using Quast outputs this report: The genome fraction and NGA50 are lower than the assembly produced by Canu and the PacBio reads. The largest contig is also a lot smaller than for the Canu assembly (only 522 651 bp long), which means that the Spades assembly did not manage to assebmle the whole chromosome like Canu did.

Mummer

Quality evaluation using Mummer resulting Mummerplot:

The Mummerplot also speaks for that this assembly is not as good as the one produced by Canu. There are only short regions that for lines and most of them have inclination -1 which indicated inverted sequences.

So, on a whole this assembly is not as good as the one produced by Canu. The reason for this could be the algorithms used in Spades or the quality of the reads put into the assembler. But since we do use both short Illumina reads and long Nanopore reads, I expected this assembly to be better than the one from Canu since we can achieve both high coverage and span gaps. So, most likely the bad quality of the Spades assembly is due to bad quality of the input reads.