Distance based tree methods - rrwick/Verticall GitHub Wiki
This page describes some approaches for turning a PHYLIP distance matrix into a Newick-format tree.
BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data
BIONJ is implemented in a command line tool which you can compile like this:
gcc BIONJ.c -O3 -o bionj
Then build a tree like this:
bionj verticall.phylip verticall.newick
After building the tree, I like to do some final tweaks in R:
library(ape); library(phangorn)
tree <- read.tree("verticall.newick") # load in the tree
tree$edge.length <- pmax(tree$edge.length, 0.0) # set any negative branch lengths to zero
tree <- midpoint(tree) # midpoint-root the tree
write.tree(tree, "verticall.newick") # save the tree to file in Newick format
BIONJ is also implemented in ape, so you can do the whole process (tree building and final tweaks) in R:
library(ape); library(phangorn)
distances <- readDist("verticall.phylip") # load in the distance matrix
tree <- bionj(distances) # build a tree using BIONJ
tree$edge.length <- pmax(tree$edge.length, 0.0) # set any negative branch lengths to zero
tree <- midpoint(tree) # midpoint-root the tree
write.tree(tree, "verticall.newick") # save the tree to file in Newick format
BIONJ in ape produces a nearly identical tree to the CLI tool above, the main difference being that ape gives more numerical precision in its branch lengths. For example, the BIONJ CLI tool might give a branch length of 0.000050
where BIONJ in ape would give 4.960667866e-05
.
Fast and Accurate Phylogeny Reconstruction Algorithms Based on the Minimum-Evolution Principle
Like BIONJ, this method is implemented in ape so can be run in R:
library(ape); library(phangorn)
distances <- readDist("verticall.phylip") # load in the distance matrix
tree <- fastme.bal(distances) # build a tree using FastME v1
tree$edge.length <- pmax(tree$edge.length, 0.0) # set any negative branch lengths to zero
tree <- midpoint(tree) # midpoint-root the tree
write.tree(tree, "verticall.newick") # save the tree to file in Newick format
FastME 2.0: A Comprehensive, Accurate, and Fast Distance-Based Phylogeny Inference Program
FastME v2 adds additional bells and whistles to the FastME algorithm. It is implemented in a command line tool which I run like this:
fastme --method B --nni B --spr -i verticall.phylip -o verticall.newick -I verticall.info
As before, you can follow up with tweaks in R:
library(ape); library(phangorn)
tree <- read.tree("verticall.newick") # load in the tree
tree$edge.length <- pmax(tree$edge.length, 0.0) # set any negative branch lengths to zero
tree <- midpoint(tree) # midpoint-root the tree
write.tree(tree, "verticall.newick") # save the tree to file in Newick format
To test computational performance, I ran the above tree-building approaches (exactly as they are written above) on my Macbook (M1 Pro), measuring time and memory.
|
|
In the above examples I use midpoint rooting, which is implemented in phangorn and works well in many scenarios. If your tree has an outgroup, then midpoint rooting should place the root between the outgroup and your other samples, which is probably what you want.
If your tree doesn't have an outgroup, then you might want to consider minimum variance rooting, which aims to minimise the variance of root-to-tip distances (paper, tool). Or if you have dates for your isolates, rooting via root-to-tip regression (implemented in ape) might also be a good option.