Quick start - rrwick/Verticall GitHub Wiki
There are two main approaches you can take to build a tree using Verticall:
- Distance tree workflow: use all pairwise comparisons in a set of assemblies to produce a distance matrix, which can be used to build a distance tree. This is appropriate for more diverse datasets, e.g. genomes which span multiple clonal groups or even multiple species.
- Alignment tree workflow: compare each assembly to a reference genome to mask horizontally acquired regions from a whole-genome pseudo-alignment, which can be used to build a ML tree. This is appropriate for more larger datasets, e.g. thousands of closely related genomes.
Take a look at their wiki pages for a more detailed discussion of each approach. Short versions of each method are below.
Sample data
If you want a quick and easy dataset to try Verticall, you can find one in the sample_data
directory of this repo. There you will find sample_data.tar.gz
which contains:
assemblies
: a directory with five synthetic genomes (required for both the distance tree workflow and the alignment tree workflow)reference.fasta
: a reference genome for the assemblies (required for the alignment tree workflow)alignment.fasta
: a whole-genome alignment of the five genomes plus the reference (required for the Alignment tree workflow)
Distance tree
Requirements:
- a directory containing an assembly for each of your genomes in FASTA format.
Step 1: perform all pairwise comparisons with Verticall pairwise:
verticall pairwise -i assemblies -o verticall.tsv
Step 2: produce a PHYLIP distance matrix with Verticall matrix:
verticall matrix -i verticall.tsv -o verticall.phylip
Step 3: use a distance-based algorithm to build a tree using the matrix:
fastme --method B --nni B --spr -i verticall.phylip -o verticall.newick
Alignment tree
Requirements:
- a directory containing an assembly for each of your genomes in FASTA format
- a reference sequence in FASTA format
- a whole-genome pseudo-alignment in FASTA format
Step 1: perform pairwise comparisons of each assembly to the reference with Verticall pairwise:
verticall pairwise -i assemblies -o verticall.tsv -r reference.fasta
Step 2: mask horizontal regions and unaligned regions in the pseudo-alignment using Verticall mask:
verticall mask -i verticall.tsv -a alignment.fasta -o masked_alignment.fasta
Step 3: build a tree with using an ML algorithm such as IQ-TREE:
iqtree2 -s masked_alignment.fasta