Assessing quality of genome assembly - aechchiki/SIB_LongReadsWorkshop_Zurich17 GitHub Wiki

We have assembled genome, now we would like to find out what is the quality of all the assemblies.

If high quality reference exists, it allows a direct comparison of discrepancies between the assembly and reference. However, not all of us are that lucky to have one.

The reference-free metrics require some assumptions, the comparison of continuity of assembly (N50 or NG50) is meaningful only if the assemblies have the same, or at least comparable, number of misassembles between assemblies. Completeness metrics (BUSCO or CEGMA scores) do assume a gene content of your genome by its phylogeny and likelihood of mapping assume poisson distribution of reads over genome and correct mapping of them on the assembly. However, in the absence of reference, the only option is to use several of these metrics.

Next

Go to tutorial Reference-based metrics .

Go back to Table of content .