Assembler differences - rrwick/Bandage GitHub Wiki

Velvet

Graphs produced by Velvet are usually called 'LastGraph' and are produced with a specified k-mer. The sequence in a node is made of the final base of each of the k-mers in that node.

This figure (adapted from Zerbino and Birney, Genome Research, 2008) shows 3 nodes in a de Bruijn graph with a k-mer size of 5. The node sequences are in the blue rectangles and the k-mer sequences are shown next to the nodes. It illustrates that the k-mers in each node are reverse complements of the k-mers in the opposite node. However, the node sequences are not exact reverse complements, but are shifted by a distance of k-1 (4 in this case).

Because each node in a Velvet graph only contains one base per k-mer, some nodes may have very short sequences, sometimes just 1 or 2 bases.

SPAdes

SPAdes uses a de Bruijn graph like Velvet, but it generates the node sequences differently. They include the entirety of the first k-mer in each node.

When Bandage loads a SPAdes graph, it automatically detects these overlaps so they can be removed in path sequences (see graph paths). Overlaps can also become apparent when viewing BLAST hits. If a hit extends to the end of one node of a SPAdes graph, the connected nodes may also show small hits in the overlap region.

SPAdes graphs are saved in a FASTG format, which differs somewhat from the official FASTG spec. Bandage currently supports this SPAdes-flavour of FASTG, not the official FASTG format. Also note that prior to version 3.5.0, SPAdes had a bug which resulted in missing graph edges. Therefore, when using Bandage, be sure to use SPAdes v3.5.0 or later.

MEGAHIT

MEGAHIT uses the same graph format as SPAdes. MEGAHIT graphs also have node overlaps, so the above notes regarding SPAdes graphs apply to MEGAHIT graphs as well.

MEGAHIT has had the ability to generate graph files since version 0.3.0. They are not made automatically but must be generated by running megahit_toolkit contig2fastg. See visualizing MEGAHIT's contig graph for more information.

Trinity

Trinity graphs are unique in their node naming scheme. Assembled Trinity sequences can be grouped at multiple levels: transcript, component, gene and isoform. For this reason, node names in Bandage for a Trinity graph have prefixes that mirror the headers in the Trinity.fasta file.

Trinity graphs do not have the overlap present in SPAdes/MEGAHIT graphs.