Checkpoint: genome assembly answers - aechchiki/SIB_LongReadsWorkshop_Zurich17 GitHub Wiki
Checkpoint Genome assembly : Answers
-
What are three main steps of long read assembly? [ OLC - Overlaps, Layout and Consensus]
-
What is the computational bottleneck? [ Overlap - this step requires to search to all-versus-all overlaps between reads. There are computational trick how to make this procedure faster, but it still remains to be the long step. ]
-
What are aspects of genome that should help you decide which algorithm to use for assembly? [ Genome size - small genomes have non-scaled programs ( HGAP, Unicycler ) that are very good in assembling small genomes, heterozygosity, structural variations between haplotypes, ploidy -
Canu
is not designed for dangerous levels of heterozygosity (> 0.5%, but < 3%), some haplotype sequences would get smashed into one sequence while other would be assembled separately resulting in something in between of haploid and diploid assembly. The assembler suited for highly heterozygous and di/polyploid data isFalcon
with it's module for unzipping haplotypes visFalcon-unzip
]
Canu and Miniasm
-
What is the extra step in
Canu
besides OLC? [Correction of reads, creating corrected reads that will enter classical OLC assembly. ] -
What is the omitted step in
Miniasm
assembly? Are you aware of a tool to calculate the omitted step? [It omits Consensus step, a tool calculating consensus build withMiniasm
in mind is calledRacon
] -
Which of the assembler was faster? Can you guess why? [
Miniasm
is way faster because it omits correction step and consensus step, but also because algorithm used for search of overlaps is significantly faster than algorithm used byCanu
.]
Next
Go to next section Transcriptome assembly
Back to Checkpoint .
Go back to Table of content .