Comparing trees - arklumpus/TreeViewer GitHub Wiki
TreeViewer can be used to create tree comparisons. This can mean either comparing two individual trees with each other, or comparing one tree with a set of trees.
Comparing two trees
The file Samples.trees contains 500 trees from a Bayesian analysis of a simulated dataset, which have also been used in the example about tree statistics. Using a tree space plot built according to the weighted Robinson-Foulds metric, that example showed that the trees can be divided in two clusters. The medoid for the first cluster is the 415th tree in the file, while the medoid for the second cluster is the 139th tree.
Having established this, it might be interesting to figure out what are the actual differences between the consensus of the whole sample of trees, tree #415, and tree #139.
Opening and drawing the tree
To do this, first of all open the tree file in TreeViewer. The program will automatically compute the consensus tree and show it as an unrooted tree:
To make the tree look nicer, you can click on the Circular
button in the Actions
tab; this will reshape the tree in a circular style. This will also issue a warning message (which can be seen as the blue i
icon in the status bar at the bottom of the window); this is because the tree is unrooted, therefore the program warns you that it might be misleading to draw a root branch. To remove the root branch, click on the Plot actions
button in the Modules
tab, then expand the settings for the Branches module (note the warning here), and uncheck the Root branch
check box. The warning should now disappear. To make it even clearer that the tree is unrooted, you can click on the Coordinates module
button, expand the parameters, set the Inner radius
to 0
, and finally click Apply
. The tree will now have a trifurcation at the root:
Adding support values
Before doing the tree comparison, it is useful to show support values on the tree. Another tutorial contains more details about how to do this; but basically you should:
- In the Plot elements, add a new instance of the Node shapes module.
- Change the following settings:
Show on
toInternal nodes
Shape
toCircle
- Uncheck
Auto fill colour by node
- Click on the wrench icon next to
Fill colour
, and set theAttribute name
toSupport
and theAttribute type
toNumber
. Then, click on the gradient and add/edit gradient stops to get the following:- At
0.0
, black - At
0.949
, black (again) - At
0.95
, grey - At
0.99
, grey (again) - At
0.991
, transparent - At
1.0
, transparent (again)
- At
After this, the plot should update to show nodes with support lower than 95% in black and nodes with support between 95% and 99% in grey:
Performing the tree comparison
To perform the tree comparison, click on the Further transformations
button in the Modules
tab, then add a new instance of the module Compare trees. This module will compare the current tree with one or more trees from an attachment or one of the loaded trees (which is what we want to do now).
Expand the parameters for this module, and change the Tree source
to Loaded tree
. Then, change the Tree index
to 415
and click Apply
. Nothing should change on the plot, but if you click on a branch and inspect its attributes, you should notice that there are some new attributes in addition to Name
, Length
and Support
:
- The
Tree415_Present
attribute tells you whether the split induced by the selected branch is present in the other tree we are comparing (Yes
) or not (No
). - The
Tree415_Compatible
attribute tells you whether the split induced by the selected branch is compatible with the other tree (Yes
) or not (No
). Note that if both trees are fully bifurcating and contain exactly the same leaves, these two attributes will always be equal; you could have different values if the trees have different leaves or multifurcating nodes. - The
Tree415_Length
attribute gives you the length of the equivalent branch in tree #415 (naturally, this is only shown where the branch is present in both trees). - The
Tree415_Name
attribute (only present for tip nodes) gives you the name of the node in tree #415 (which is going to be the same as in the current tree).
Essentially, the *_Present
and *_Compatible
attributes describe the results of the comparison, while the other attributes are the values of the same attributes on the other tree that have been copied over to the current tree.
You can now follow the same steps to add another instance of the Compare trees module, this time setting the Tree index
to 139
. Again, this will add a few new attributes to the tree.
To see which branches from the consensus tree are not present in the other two trees, you can use the search function: press CTRL+F
(CMD+F
on macOS) or click on the Search
button in the Actions
tab; this will show the search bar above the plot. Expand the Advanced
section and change the Search attribute
to Tree415_Present
. Then, enter No
in the Search
box. You should see that no nodes are highlighted in the plot: this means that the topology of tree #415 is identical to the consensus tree we are looking at.
If you now change the Search attribute
to Tree139_Present
, you will notice that three nodes get highlighted in yellow: these are the nodes that are not present in tree #139.
To highlight these nodes more clearly, we can use another Node shapes module. First of all, go in the further transformations, and add a new instance of the Replace attribute module. Expand the options for this module and, in the Search attribute
section, set the Attribute
to Tree139_Present
and the Value
to No
. In the Replace attribute
section, set the Attribute
to something like ShapeSize139
, the Attribute type
to Number
and the Value
to 4
. The plot should not change, but if you now click on one of the nodes that were highlighted earlier, you should see that it has a new attribute called ShapeSize139
with value 4
.
Now, go to the Plot elements
tab, and add a new instance of the Node shapes
module. Set Show on
to Internal nodes
, the Shape
to Circle
, uncheck the Auto fill colour by node
check box and set the Fill colour
to a light orange. Finally, click on the little wrench icon next to the Size
parameter and, in the new window, set the Attribute name
to ShapeSize139
and the Default value
to 0
. This will ensure that the orange circles only appear for nodes that have the ShapeSize139
attribute, which we created with the Replace attribute
module.
The plot should now look like this:
The three nodes that are not present in tree #139 are clearly shown; you can see that they have low support (as expected, since they are missing from some of the trees) and are quite short. This reflects parts of the (simulated) phylogeny that were hard to reconstruct with the available sequence data.
Tidying up the plot
An issue with the plot as it currently stands is that the tip labels are not all at the same distance from the centre. While this does not really affect the issue at hand very much, it would still be nice to make the plot look better. To do this, you can first of all expand the options of the Labels module in the plot actions, and change the Anchor
to Origin
. This will cause all the tip labels to end up in the middle of the tree. Change the X
value of the Position
to 165
to get them back to their rightful position outside the tree; you will notice that, this time, they are well aligned:
However, the space between the end of the branches and the tip labels is a bit annoying. We can fill this space by using the Branch extensions module: add a new instance of this module in the plot actions, then open the options and set the Branch reference
to Circular
, the End anchor
to Origin
, and the X
value for the End
to 160
. The plot will now look as if the branches were extended to end up close to the tip labels:
This is however misleading; we should change the appearance of the branch extensions so that it is clear that they are not part of the "real" branches. You can do this by changing the Line colour
to grey, and the Line dash
to get a dashed line. Finally, you can drag the Branch extensions module up all the way, so that it appears behind the Branches module. The final plot should look similar to this:
You can now save the tree file or the plot as a PDF or SVG file using the items from the File
menu. You can also download the TreeComparison.tbi
tree file, which contains the trees along with all the modules.
Comparing multiple trees
Another possibility is to compare a single tree (e.g., a species tree) with multiple other trees (e.g., many gene trees), to highlight which parts of the tree can be consistently recovered in by multiple markers.
The file species.tre contains an unrooted phylogenomic tree of 60 cyanobacteria, obtained through a partitioned maximum-likelihood analysis over 137 protein-coding genes. This is the same tree that was used in the tutorial showing how to highlight support values on a tree; if you look at the plot produced by that tutorial, you will notice that most nodes have high support values (>99%).
If you open this file in TreeViewer, it should look similar to the following:
The file gene.trees contains individual gene trees for each of the 137 genes that were used to build this phylogeny. Note that not all of the gene trees contain all of the taxa in the species tree (of the 60 total strains, only 8 are present in all 137 genes, and only 5 trees contain all 60 taxa).
We would like to compare the species tree with each one of the gene trees. To do this, first of all add the gene tree file as an attachment to the species tree plot, by clicking on the Attachments
tab, and then on the Add attachment
button. You can leave all the attachment settings to their default values.
Then, click on the Modules
tab and on Further transformations
to open the module panel, then on the Add module
button under Further transformations
and add an instance of the Compare trees module. Expand the options for this module, make sure that the Tree source
is set to Attachment
and select the gene
attachment for the Tree
box. The program will run the tree comparisons, but the plot should not change at this stage.
You will notice that the Compare trees module is issuing a warning saying that the attachment contains multiple trees, and their attributes will thus not be stored. To remove this warning, you can just uncheck the Store attributes on equivalent splits
check box, and then click Apply
.
If you now click on a branch and look at its attributes, you will notice that two new attributes have been added: one is called gene_Present
, and the other gene_Compatible
.
-
The
*_Present
attributes represents the proportion of gene trees (from 0 to 1) in which the split induced by the node is present (note that for the split to be present in a gene tree, the gene tree must contain exactly the same leaves as the species tree). -
The
*_Compatible
attribute, instead, represents the proportion of gene trees (again, from 0 to 1) that are compatible with the split induced by the branch. In this case, if the gene tree does not contain any of the taxa included on one side of the split, the gene tree is considered as being compatible with the split.
A few observations for the values of these attributes:
-
Terminal branches (leaf nodes) represent a split between a single strain on one side, and all the other strains on the other side. Such a split must necessarily be present in any tree including the leaf node; therefore, the
*_Compatible
attribute for leaf nodes will always be 1. -
Instead, the
*_Present
attribute for a leaf node will be equal to the proportion of trees that contain all the taxa in the species tree (regardless of their topology). In this case, 5 trees out of 137 contain all the taxa, and in fact the value of thegene_Present
attribute for leaf nodes is $\approx 0.0365 \approx 5/137$. This is also the maximum value that this attribute can have. -
The meaning of the
*_Compatible
attribute is somewhat similar to the gene concordance factors (gCF) that can be computed, e.g., by IQ-TREE (Minh et al., 2020). The main difference is that TreeViewer counts trees that are not "decisive" for a split as being compatible with it, while the gCF excludes them from the computation. -
In general, the
*_Compatible
attribute will be more useful than the*_Present
attribute. The two attributes will be identical if all gene trees are fully bifurcating and contain exactly the same taxa. They might both be useful if most of the trees contain the same taxa, but some are multifurcating.
Based on this, we now want to show the value of the gene_Compatible
attribute on the tree. This could be done in multiple ways; e.g. we could add coloured node shapes (as in the example highlighting bootstrap values), or colour the branches according to the value of this attribute. To do this, click on the Plot actions
button to expand the options for the Branches
module, and click on the wrench icon next to Colour
. In the new window, enter gene_Compatible
as the Attribute name
and set the Attribute type
to Number
. This will cause three new controls to appear the Minimum
(leave it set to 0
), the Maximum
(leave it set to 1
), and a gradient button. You can click on the gradient button and select a different gradient (e.g., the Viridis gradient, which is the fifth one in the first row), then click OK
to close the window.
The plot will update, and the branches will be coloured according to the value of the attribute (branches with low compatibility scores are darker, while branches with higher compatibility scores are lighter). The plot should now look similar to the following:
To clarify even further which branches have low compatibility score values, we can use the gene_Compatible
score to affect the thickness of the branches. For example, we may want to have a thickness of 1
for branches with a gene_Compatible
value of 1
, while branches with lower scores should be thicker, up to a thickness of 5
where gene_Compatible
is 0
. Essentially, we want to apply the following formula:
$$ \mathrm{thickness} = 5 - 4 \cdot \mathrm{gene_Compatible} $$
We can achieve this by using the Linear transformation Further transformation module. To use this module, click on the Further transformations
button to open the Further transformation panel on the left, and click on the button to add a new module. Add an instance of the Linear transformation module, and change the Attribute
to gene_Compatible
, the Scaling factor
to -4
, and the Translation factor
to 5
. Finally, change the Replacement attribute
to Thickness
and click on Apply
.
This module will take the value of the selected attribute (in this case gene_Compatible
) and apply a linear transformation $f\left (x \right) = ax + b$ to its value, then store the result in the replacement attribute (in this case, Thickness
). The Scaling factor
represents $a$ in the formula, while the Translation factor
represents $b$; thus, by setting them to -4
and 5
, respectively, we obtain the transformation we needed.
By default, the Branches module uses the Thickness
attribute to determine the thickness of the branches in the plot. Therefore, since we are storing the computed value in this attribute, the plot should automatically update to look similar to the following:
This highlights even better that there are multiple branches with low compatibility values, even though the bootstrap support values on the tree are all quite high, which is expected (this is the same phenomenon highlighted by Minh et al (2020) for gCF).
Naturally, if you wish to, you could make some more changes to the tree (e.g., rooting it, or adding node shapes to highlight the boostrap support); for now, this concludes the tutorial. You can save the tree file or the plot as a PDF or SVG file using the items from the File
menu. You can also download the MultipleTreeComparisons.tbi
tree file, which contains the species tree with the attached gene trees along with all the modules.
References
- Minh, B. Q., Hahn, M. W., & Lanfear, R. (2020). New Methods to Calculate Concordance Factors for Phylogenomic Datasets. Molecular Biology and Evolution, 37(9), 2727–2733. DOI: 10.1093/MOLBEV/MSAA106