Quick start - rrwick/Verticall GitHub Wiki

There are two main approaches you can take to build a tree using Verticall:

  1. Distance tree workflow: use all pairwise comparisons in a set of assemblies to produce a distance matrix, which can be used to build a distance tree. This is appropriate for more diverse datasets, e.g. genomes which span multiple clonal groups or even multiple species.
  2. Alignment tree workflow: compare each assembly to a reference genome to mask horizontally acquired regions from a whole-genome pseudo-alignment, which can be used to build a ML tree. This is appropriate for more larger datasets, e.g. thousands of closely related genomes.

Take a look at their wiki pages for a more detailed discussion of each approach. Short versions of each method are below.

Sample data

If you want a quick and easy dataset to try Verticall, you can find one in the sample_data directory of this repo. There you will find sample_data.tar.gz which contains:

Distance tree

Requirements:

  • a directory containing an assembly for each of your genomes in FASTA format.

Step 1: perform all pairwise comparisons with Verticall pairwise:

verticall pairwise -i assemblies -o verticall.tsv

Step 2: produce a PHYLIP distance matrix with Verticall matrix:

verticall matrix -i verticall.tsv -o verticall.phylip

Step 3: use a distance-based algorithm to build a tree using the matrix:

fastme --method B --nni B --spr -i verticall.phylip -o verticall.newick

Alignment tree

Requirements:

  • a directory containing an assembly for each of your genomes in FASTA format
  • a reference sequence in FASTA format
  • a whole-genome pseudo-alignment in FASTA format

Step 1: perform pairwise comparisons of each assembly to the reference with Verticall pairwise:

verticall pairwise -i assemblies -o verticall.tsv -r reference.fasta

Step 2: mask horizontal regions and unaligned regions in the pseudo-alignment using Verticall mask:

verticall mask -i verticall.tsv -a alignment.fasta -o masked_alignment.fasta

Step 3: build a tree with using an ML algorithm such as IQ-TREE:

iqtree2 -s masked_alignment.fasta