continuity metrics - aechchiki/SIB_LongReadsWorkshop_Zurich17 GitHub Wiki
A very popular metrics used to evaluate assemblies is a wighted median of contig sizes, called N50. It represents a size of smallest contig in a set of contigs covering at least half of genome :
Calculation of N50
Let the set of assembled contigs be sorted from the longest to shortest contig. Now we sum contig sizes till we reach an half of total assembly size (i.e. sum of all contig sizes in the assembly). The last contig size that we added to summation is N50.
Note that this metrics rely on completely correct assembly. If all the reads would be just catenated in one huge (completely wrong) contig, N50 will be huge even the assembly is meaning less. N50 should never be the only metric used to evaluate assemblies.
NG50
A baby step towards biological reality is using known genome size instead of total sum of contigs for calculation of N50, such metrics is called then NG50.
Next
Read about Completeness metrics .
Read about Contamination .
Finish this section, go to Checkpoint
Go back to Table of content .