Distance based tree methods - rrwick/Verticall GitHub Wiki

This page describes some approaches for turning a PHYLIP distance matrix into a Newick-format tree.

BIONJ (CLI)

BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data

BIONJ is implemented in a command line tool which you can compile like this:

gcc BIONJ.c -O3 -o bionj

Then build a tree like this:

bionj verticall.phylip verticall.newick

After building the tree, I like to do some final tweaks in R:

library(ape); library(phangorn)
tree <- read.tree("verticall.newick")            # load in the tree
tree$edge.length <- pmax(tree$edge.length, 0.0)  # set any negative branch lengths to zero
tree <- midpoint(tree)                           # midpoint-root the tree
write.tree(tree, "verticall.newick")             # save the tree to file in Newick format

BIONJ (ape)

BIONJ is also implemented in ape, so you can do the whole process (tree building and final tweaks) in R:

library(ape); library(phangorn)
distances <- readDist("verticall.phylip")        # load in the distance matrix
tree <- bionj(distances)                         # build a tree using BIONJ
tree$edge.length <- pmax(tree$edge.length, 0.0)  # set any negative branch lengths to zero
tree <- midpoint(tree)                           # midpoint-root the tree
write.tree(tree, "verticall.newick")             # save the tree to file in Newick format

BIONJ in ape produces a nearly identical tree to the CLI tool above, the main difference being that ape gives more numerical precision in its branch lengths. For example, the BIONJ CLI tool might give a branch length of 0.000050 where BIONJ in ape would give 4.960667866e-05.

FastME v1 (ape)

Fast and Accurate Phylogeny Reconstruction Algorithms Based on the Minimum-Evolution Principle

Like BIONJ, this method is implemented in ape so can be run in R:

library(ape); library(phangorn)
distances <- readDist("verticall.phylip")        # load in the distance matrix
tree <- fastme.bal(distances)                    # build a tree using FastME v1
tree$edge.length <- pmax(tree$edge.length, 0.0)  # set any negative branch lengths to zero
tree <- midpoint(tree)                           # midpoint-root the tree
write.tree(tree, "verticall.newick")             # save the tree to file in Newick format

FastME v2 (CLI)

FastME 2.0: A Comprehensive, Accurate, and Fast Distance-Based Phylogeny Inference Program

FastME v2 adds additional bells and whistles to the FastME algorithm. It is implemented in a command line tool which I run like this:

fastme --method B --nni B --spr -i verticall.phylip -o verticall.newick -I verticall.info

As before, you can follow up with tweaks in R:

library(ape); library(phangorn)
tree <- read.tree("verticall.newick")            # load in the tree
tree$edge.length <- pmax(tree$edge.length, 0.0)  # set any negative branch lengths to zero
tree <- midpoint(tree)                           # midpoint-root the tree
write.tree(tree, "verticall.newick")             # save the tree to file in Newick format

Performance

To test computational performance, I ran the above tree-building approaches (exactly as they are written above) on my Macbook (M1 Pro), measuring time and memory.

Time vs genome count for distance-based trees

Figure 1: Time vs genome count for distance-based tree-building algorithms.

Memory vs genome count for distance-based trees

Figure 2: Memory vs genome count for distance-based tree-building algorithms.

Rooting

In the above examples I use midpoint rooting, which is implemented in phangorn and works well in many scenarios. If your tree has an outgroup, then midpoint rooting should place the root between the outgroup and your other samples, which is probably what you want.

If your tree doesn't have an outgroup, then you might want to consider minimum variance rooting, which aims to minimise the variance of root-to-tip distances (paper, tool). Or if you have dates for your isolates, rooting via root-to-tip regression (implemented in ape) might also be a good option.

⚠️ **GitHub.com Fallback** ⚠️