GenomeIntro - aechchiki/SIB_LongReadsWorkshop_Zurich18 GitHub Wiki
Introduction
Section: Genome assembly assessment [1/5].
We have assembled genome, visualized the graph, polished it. Now we would like to find out what is the (quantitative) quality of the assemblies.
If high quality reference exists, it allows a direct comparison of discrepancies between the assembly and reference. However, not all of us are that lucky to have one.
The reference-free metrics require some assumptions, the comparison of continuity of assembly (N50 or NG50) is meaningful only if the assemblies have the same, or at least comparable, number of misassembles between assemblies. Completeness metrics (BUSCO or CEGMA scores) do assume a gene content of your genome by its phylogeny and likelihood of mapping assume poisson distribution of reads over genome and correct mapping of them on the assembly. However, in the absence of reference, the only option is to use several of these metrics.