Comparing trees - arklumpus/TreeViewer GitHub Wiki

TreeViewer can be used to create tree comparisons. This can mean either comparing two individual trees with each other, or comparing one tree with a set of trees.

Comparing two trees

The file Samples.trees contains 500 trees from a Bayesian analysis of a simulated dataset, which have also been used in the example about tree statistics. Using a tree space plot built according to the weighted Robinson-Foulds metric, that example showed that the trees can be divided in two clusters. The medoid for the first cluster is the 415th tree in the file, while the medoid for the second cluster is the 139th tree.

Having established this, it might be interesting to figure out what are the actual differences between the consensus of the whole sample of trees, tree #415, and tree #139.

Opening and drawing the tree

To do this, first of all open the tree file in TreeViewer. The program will automatically compute the consensus tree and show it as an unrooted tree:

To make the tree look nicer, you can click on the Circular button in the Actions tab; this will reshape the tree in a circular style. This will also issue a warning message (which can be seen as the blue i icon in the status bar at the bottom of the window); this is because the tree is unrooted, therefore the program warns you that it might be misleading to draw a root branch. To remove the root branch, click on the Plot actions button in the Modules tab, then expand the settings for the Branches module (note the warning here), and uncheck the Root branch check box. The warning should now disappear. To make it even clearer that the tree is unrooted, you can click on the Coordinates module button, expand the parameters, set the Inner radius to 0, and finally click Apply. The tree will now have a trifurcation at the root:

Adding support values

Before doing the tree comparison, it is useful to show support values on the tree. Another tutorial contains more details about how to do this; but basically you should:

  • In the Plot elements, add a new instance of the Node shapes module.
  • Change the following settings:
    • Show on to Internal nodes
    • Shape to Circle
    • Uncheck Auto fill colour by node
    • Click on the wrench icon next to Fill colour, and set the Attribute name to Support and the Attribute type to Number. Then, click on the gradient and add/edit gradient stops to get the following:
      • At 0.0, black
      • At 0.949, black (again)
      • At 0.95, grey
      • At 0.99, grey (again)
      • At 0.991, transparent
      • At 1.0, transparent (again)

After this, the plot should update to show nodes with support lower than 95% in black and nodes with support between 95% and 99% in grey:

Performing the tree comparison

To perform the tree comparison, click on the Further transformations button in the Modules tab, then add a new instance of the module Compare trees. This module will compare the current tree with one or more trees from an attachment or one of the loaded trees (which is what we want to do now).

Expand the parameters for this module, and change the Tree source to Loaded tree. Then, change the Tree index to 415 and click Apply. Nothing should change on the plot, but if you click on a branch and inspect its attributes, you should notice that there are some new attributes in addition to Name, Length and Support:

  • The Tree415_Present attribute tells you whether the split induced by the selected branch is present in the other tree we are comparing (Yes) or not (No).
  • The Tree415_Compatible attribute tells you whether the split induced by the selected branch is compatible with the other tree (Yes) or not (No). Note that if both trees are fully bifurcating and contain exactly the same leaves, these two attributes will always be equal; you could have different values if the trees have different leaves or multifurcating nodes.
  • The Tree415_Length attribute gives you the length of the equivalent branch in tree #415 (naturally, this is only shown where the branch is present in both trees).
  • The Tree415_Name attribute (only present for tip nodes) gives you the name of the node in tree #415 (which is going to be the same as in the current tree).

Essentially, the *_Present and *_Compatible attributes describe the results of the comparison, while the other attributes are the values of the same attributes on the other tree that have been copied over to the current tree.

You can now follow the same steps to add another instance of the Compare trees module, this time setting the Tree index to 139. Again, this will add a few new attributes to the tree.

To see which branches from the consensus tree are not present in the other two trees, you can use the search function: press CTRL+F (CMD+F on macOS) or click on the Search button in the Actions tab; this will show the search bar above the plot. Expand the Advanced section and change the Search attribute to Tree415_Present. Then, enter No in the Search box. You should see that no nodes are highlighted in the plot: this means that the topology of tree #415 is identical to the consensus tree we are looking at.

If you now change the Search attribute to Tree139_Present, you will notice that three nodes get highlighted in yellow: these are the nodes that are not present in tree #139.

To highlight these nodes more clearly, we can use another Node shapes module. First of all, go in the further transformations, and add a new instance of the Replace attribute module. Expand the options for this module and, in the Search attribute section, set the Attribute to Tree139_Present and the Value to No. In the Replace attribute section, set the Attribute to something like ShapeSize139, the Attribute type to Number and the Value to 4. The plot should not change, but if you now click on one of the nodes that were highlighted earlier, you should see that it has a new attribute called ShapeSize139 with value 4.

Now, go to the Plot elements tab, and add a new instance of the Node shapes module. Set Show on to Internal nodes, the Shape to Circle, uncheck the Auto fill colour by node check box and set the Fill colour to a light orange. Finally, click on the little wrench icon next to the Size parameter and, in the new window, set the Attribute name to ShapeSize139 and the Default value to 0. This will ensure that the orange circles only appear for nodes that have the ShapeSize139 attribute, which we created with the Replace attribute module.

The plot should now look like this:

The three nodes that are not present in tree #139 are clearly shown; you can see that they have low support (as expected, since they are missing from some of the trees) and are quite short. This reflects parts of the (simulated) phylogeny that were hard to reconstruct with the available sequence data.

Tidying up the plot

An issue with the plot as it currently stands is that the tip labels are not all at the same distance from the centre. While this does not really affect the issue at hand very much, it would still be nice to make the plot look better. To do this, you can first of all expand the options of the Labels module in the plot actions, and change the Anchor to Origin. This will cause all the tip labels to end up in the middle of the tree. Change the X value of the Position to 165 to get them back to their rightful position outside the tree; you will notice that, this time, they are well aligned:

However, the space between the end of the branches and the tip labels is a bit annoying. We can fill this space by using the Branch extensions module: add a new instance of this module in the plot actions, then open the options and set the Branch reference to Circular, the End anchor to Origin, and the X value for the End to 160. The plot will now look as if the branches were extended to end up close to the tip labels:

This is however misleading; we should change the appearance of the branch extensions so that it is clear that they are not part of the "real" branches. You can do this by changing the Line colour to grey, and the Line dash to get a dashed line. Finally, you can drag the Branch extensions module up all the way, so that it appears behind the Branches module. The final plot should look similar to this:

You can now save the tree file or the plot as a PDF or SVG file using the items from the File menu. You can also download the TreeComparison.tbi tree file, which contains the trees along with all the modules.

Comparing multiple trees

Another possibility is to compare a single tree (e.g., a species tree) with multiple other trees (e.g., many gene trees), to highlight which parts of the tree can be consistently recovered in by multiple markers.

The file species.tre contains an unrooted phylogenomic tree of 60 cyanobacteria, obtained through a partitioned maximum-likelihood analysis over 137 protein-coding genes. This is the same tree that was used in the tutorial showing how to highlight support values on a tree; if you look at the plot produced by that tutorial, you will notice that most nodes have high support values (>99%).

If you open this file in TreeViewer, it should look similar to the following:

The file gene.trees contains individual gene trees for each of the 137 genes that were used to build this phylogeny. Note that not all of the gene trees contain all of the taxa in the species tree (of the 60 total strains, only 8 are present in all 137 genes, and only 5 trees contain all 60 taxa).

We would like to compare the species tree with each one of the gene trees. To do this, first of all add the gene tree file as an attachment to the species tree plot, by clicking on the Attachments tab, and then on the Add attachment button. You can leave all the attachment settings to their default values.

Then, click on the Modules tab and on Further transformations to open the module panel, then on the Add module button under Further transformations and add an instance of the Compare trees module. Expand the options for this module, make sure that the Tree source is set to Attachment and select the gene attachment for the Tree box. The program will run the tree comparisons, but the plot should not change at this stage.

You will notice that the Compare trees module is issuing a warning saying that the attachment contains multiple trees, and their attributes will thus not be stored. To remove this warning, you can just uncheck the Store attributes on equivalent splits check box, and then click Apply.

If you now click on a branch and look at its attributes, you will notice that two new attributes have been added: one is called gene_Present, and the other gene_Compatible.

  • The *_Present attributes represents the proportion of gene trees (from 0 to 1) in which the split induced by the node is present (note that for the split to be present in a gene tree, the gene tree must contain exactly the same leaves as the species tree).

  • The *_Compatible attribute, instead, represents the proportion of gene trees (again, from 0 to 1) that are compatible with the split induced by the branch. In this case, if the gene tree does not contain any of the taxa included on one side of the split, the gene tree is considered as being compatible with the split.

A few observations for the values of these attributes:

  • Terminal branches (leaf nodes) represent a split between a single strain on one side, and all the other strains on the other side. Such a split must necessarily be present in any tree including the leaf node; therefore, the *_Compatible attribute for leaf nodes will always be 1.

  • Instead, the *_Present attribute for a leaf node will be equal to the proportion of trees that contain all the taxa in the species tree (regardless of their topology). In this case, 5 trees out of 137 contain all the taxa, and in fact the value of the gene_Present attribute for leaf nodes is $\approx 0.0365 \approx 5/137$. This is also the maximum value that this attribute can have.

  • The meaning of the *_Compatible attribute is somewhat similar to the gene concordance factors (gCF) that can be computed, e.g., by IQ-TREE (Minh et al., 2020). The main difference is that TreeViewer counts trees that are not "decisive" for a split as being compatible with it, while the gCF excludes them from the computation.

  • In general, the *_Compatible attribute will be more useful than the *_Present attribute. The two attributes will be identical if all gene trees are fully bifurcating and contain exactly the same taxa. They might both be useful if most of the trees contain the same taxa, but some are multifurcating.

Based on this, we now want to show the value of the gene_Compatible attribute on the tree. This could be done in multiple ways; e.g. we could add coloured node shapes (as in the example highlighting bootstrap values), or colour the branches according to the value of this attribute. To do this, click on the Plot actions button to expand the options for the Branches module, and click on the wrench icon next to Colour. In the new window, enter gene_Compatible as the Attribute name and set the Attribute type to Number. This will cause three new controls to appear the Minimum (leave it set to 0), the Maximum (leave it set to 1), and a gradient button. You can click on the gradient button and select a different gradient (e.g., the Viridis gradient, which is the fifth one in the first row), then click OK to close the window.

The plot will update, and the branches will be coloured according to the value of the attribute (branches with low compatibility scores are darker, while branches with higher compatibility scores are lighter). The plot should now look similar to the following:

To clarify even further which branches have low compatibility score values, we can use the gene_Compatible score to affect the thickness of the branches. For example, we may want to have a thickness of 1 for branches with a gene_Compatible value of 1, while branches with lower scores should be thicker, up to a thickness of 5 where gene_Compatible is 0. Essentially, we want to apply the following formula:

$$ \mathrm{thickness} = 5 - 4 \cdot \mathrm{gene_Compatible} $$

We can achieve this by using the Linear transformation Further transformation module. To use this module, click on the Further transformations button to open the Further transformation panel on the left, and click on the button to add a new module. Add an instance of the Linear transformation module, and change the Attribute to gene_Compatible, the Scaling factor to -4, and the Translation factor to 5. Finally, change the Replacement attribute to Thickness and click on Apply.

This module will take the value of the selected attribute (in this case gene_Compatible) and apply a linear transformation $f\left (x \right) = ax + b$ to its value, then store the result in the replacement attribute (in this case, Thickness). The Scaling factor represents $a$ in the formula, while the Translation factor represents $b$; thus, by setting them to -4 and 5, respectively, we obtain the transformation we needed.

By default, the Branches module uses the Thickness attribute to determine the thickness of the branches in the plot. Therefore, since we are storing the computed value in this attribute, the plot should automatically update to look similar to the following:

This highlights even better that there are multiple branches with low compatibility values, even though the bootstrap support values on the tree are all quite high, which is expected (this is the same phenomenon highlighted by Minh et al (2020) for gCF).

Naturally, if you wish to, you could make some more changes to the tree (e.g., rooting it, or adding node shapes to highlight the boostrap support); for now, this concludes the tutorial. You can save the tree file or the plot as a PDF or SVG file using the items from the File menu. You can also download the MultipleTreeComparisons.tbi tree file, which contains the species tree with the attached gene trees along with all the modules.

References

  • Minh, B. Q., Hahn, M. W., & Lanfear, R. (2020). New Methods to Calculate Concordance Factors for Phylogenomic Datasets. Molecular Biology and Evolution, 37(9), 2727–2733. DOI: 10.1093/MOLBEV/MSAA106