Plotting the age distributions in a time‐calibrated tree - arklumpus/TreeViewer GitHub Wiki

In time-calibrated phylogenetic trees (generally obtained through molecular clock analyses), the branch lengths are expressed in time units. This means that this kind of trees can be used to display information about the age of various ancestors of present taxa. When a Bayesian molecular clock analysis is run, it produces an "ensemble" of trees that constitute a (hopefully unbiased) sample from the posterior distribution. This means that the result of the analysis is not just a single age for each ancestor, but rather an "age distribution", i.e. the posterior probability that each ancestor in the tree existed at a particular point in time.

The results of a Bayesian molecular clock analysis are generally summarised by constructing a consensus tree in which each node sits at its mean/median age, and drawing a bar at each node representing some sort of confidence interval (e.g. a 95% highest-posterior-density interval). In addition to this, TreeViewer can also plot the actual shape of the distribution rather than a simple bar; this is useful to give a clearer idea of how spread out the distribution is, and in particular in the case of asymmetric or multimodal distributions.

The clock.tre file contains 1000 trees that were sampled during a Bayesian molecular clock analysis of Cyanobacteria, with each tree containing 42 strains. When the tree file is opened, TreeViewer recognises that the file contains multiple trees and automatically computes a consensus tree. The tree should look similar to the following figure:

If you open the modules panel by clicking on the Transformer module button in the Modules tab, you can change the settings for the computation of the consensus tree by expanding the options for the Transformer module. In particular, you can decide whether any trees should be skipped from the beginning of the file (useful e.g. for burnin, if the tree file has not already been processed) and whether the tree branches are built using the mean or median lengths/ages. For now, leave these settings to their default values.

Cleaning up the tree

To simplify the tree plot and make it easier to view, click on the Plot actions button in the Modules tab to display the Plot action modules, and then remove the second Labels module (which draws the branch lengths). Now, click on the Coordinates module button, expand the options for the Coordinates module and set the Width parameter to 500; finally, click on Apply. The plot should now look similar to the following figure:

Drawing the node age distributions

To add the node age distributions to the plot, we need to use a two-step process. First of all, the distributions need to be computed; since this step can be relatively time consuming (as every tree in the file needs to be processed), this is done by a Further transformation module (Set up age distributions), which associates each age distribution to the corresponding node in the tree. Then, a Plot action module (Age distributions) uses this pre-computed information to draw the age distributions. This means that the age distributions do not have to be recomputed every time the plot is drawn, which improves performance.

You should start by clicking on the Further transformations button in the Modules panel and adding a Set up age distributions module to the plot, by clicking on the Add module button. You can leave the default settings for this module unchanged; in addition to computing the age distributions, the module will also associate to each node an attribute containing the mean age and another containing the specified credible interval.

Then, enable the Age distributions Plot action module by opening the plot actions and clicking on the Add module button under the Plot elements (note that this is different from the Age distributions timeline module). This new module should show some colorful age distributions on the tree, which should now look similar to the following image:

The age distribution for each node is drawn as a "violin plot" (which can be changed to a histogram); note that this is not a kernel density estimate (KDE) - the violin plot is completely equivalent to the histogram.

You will notice that the age distributions are drawn in front of the tree, rather than behind it; thus, they partially obscure some branches (this is not very noticeable because the colours are rather transparent). To have the age distributions appear behind the tree, you can move up the Age distribution module by clicking on it and holding the left mouse button, so that this module is the first element in the plot list.

In addition to this, each branch in the tree has a limited vertical space available; this causes the distributions for some branches to overlap with each other. This can be addressed by opening the options for the Coordinates module and increasaing the Height, e.t. to 800.

The tree plot should now look similar to the following figure:

Adding a scale axis

We can also add a scale axis to the tree plot, so that the ages of the various groups can be actually read from the tree. This can be done by using the Scale axis Plot action module; to enable it, click on the Add module button under the Plot elements and select the module. This will cause a scale axis to be drawn over the tree, which should look similar to the following figure:

To make the axis nicer, we can change some of the settings for this module. Expand the options, and increase the Tick spacing to 150 (so that we have integer numbers on the axes). It is always a good idea to make sure that the End of the axis is a multiple of the Tick spacing; thus, you should increase the End parameter to 3000. Furthermore, since we do not need the decimal digits on the axis anymore, you can set the Digits to 0. Time units in this tree are million years since the present, and you can show these by entering Mya in the Units field. Finally, make sure that the scale axis is drawn behind the tree by dragging the Scale axis module all the way up.

The plot should now look similar to the following image:

To make the text in the figure more visible, we can increase its size. To do this, expand the options for the Scale axis module, click on the button to edit the Font and increase the font size to 12. Repeat the same steps to increase the font size for the Labels module as well. The plot should now look similar to the following figure:

Highlighting specific nodes

Now, this colorful plot is very pretty, but it can be a bit overcrowded; usually we will be interested in particular in the ages for a few groups, while we can ignore most of the others (maybe leaving the full image for the Supplementary Information of a paper). For example, in the case of this tree we might be interested in:

The last common ancestor (LCA) of crown group cyanobacteria (i.e. the root node of the tree).
The LCA of Macrocyanobacteria and Microcyanobacteria (i.e. the node at ~2350 Mya, whose descendants contain most strains except Gloeobacter and Pseudanabena).
The LCA of Picocyanobacteria (the node at ~600 Mya, whose descendants include the strains from Synechococcus sp. RCC 307 to Synechococcus sp. WH 7805).
The LCA of the Pseudanabaena strains.
The LCA of Richelia intracellularis and Calothrix sp. PCC 7103.

Of course, this is a fictitious example with a very reduced taxon selection, so the nodes that we are using might not be actually very relevant.

In any case, we would like to change the plot so that only the distributions for the nodes we are actually interested in are shown. To do this, expand the options for the Age distributions module and uncheck the Auto colour by node check box. This will cause a new option called Colour to appear and all the age distributions will turn black. The plot should now look similar to the following image:

Now, start by clicking on one of the nodes mentioned above (e.g. the root node). This will select the node and open the Selection panel to the right. Here, click on the "tag" button to display the attributes of this node and click on the Add attribute button to add a new attribute; in the new window, set the Attribute name to DistributionColour and leave the value empty. This will add a new Add attribute module to the plot; open the Further transformation modules in the panel on the left and expand the options for this module to click on the New value text box to select it (as if you wanted to enter some text), then press CTRL+SHIFT+C (on Windows and Linux; on macOS, press CMD+SHIFT+C) on your keyboard.

This will open a colour picker window; in this window, you should select a new colour for the age distribution for this node, such as a somber grey. When you click OK, the RGB hex representation of the colour will be entered in the New value text box. Click on the Apply button; nothing should happen.

To actually change the colour of the distribution, we need to tell the Age distributions module to pick the colour from the new DistributionColour attribute that we created. To do this, in the options for the Age distributions module, click on the wrench button. In the new window that opens, replace the default Attribute name with DistributionColour, then click OK. Now, the colour for the age distribution of the root node should actually change, and the plot should look similar to the following figure:

You should now repeat the steps to add a DistributionColour attribute with the same value to all the "interesting" nodes. When you have done so, the distributions for these nodes should be grey, while the distributions for the other nodes are still black:

To hide the distributions for the other nodes, you need to change the default value of the Colour option of the Age distributions module from black to transparent (click on the button next to Colour and set the A value to 0). Now, the tree should only contain the age distributions for the five "interesting" nodes:

Numbering the nodes

Since we are probably going to talk about these "interesting" nodes in our manuscript, it might be a good idea to number them, so that we can refer to them easily. We can do this by adding a new attribute to each node, and then adding a new Labels module to display this attribute.

Start by clicking on the root node and, in the selection panel, use the Add attribute button to add a new attribute to it. In the new window, enter Index as the Attribute name, set the Attribute type to Number, and the Attribute value to 1, then press OK. Repeat these steps to assign an Index of 2 to the LCA of Microcyanobacteria and Macrocyanobacteria, 3 to the LCA of Pseudanabaena, 4 to the LCA of Picocyanobacteria, and 5 to the LCA of Richelia and Calothrix.

Now, the attributes have been added, but they do not show up on the tree because no module is making use of them. To add the new labels, click on the Add module button under the Plot elements and select the Labels module. Then, open the options for this module and set the Show on parameter to Internal nodes and the Attribute to Index. To change the number of decimal digits shown, click on the Attribute format... button and, in the new window, set the digits to 1 (i.e. one significant digit). The tree should now look similar to the following figure:

These numbers are still a bit ugly, but we can do something to improve them. For example, it would be nice to have each number in a white circle centered over the node of interest. We can draw these circles using the Node shapes module. Enable the Node shapes by clicking on the Add module button under the Plot elements and selecting the module. This will draw a coloured star at each tip of the tree:

Now, expand the options for this module and set the Show on parameter to Internal nodes. Change the Shape to Circle and uncheck the Auto fill colour by node option; click on the colour button next to the Fill colour option that should have just appeared, and select a white colour. Finally, increase the stroke thickness to 1. The tree plot should now look similar to the following figure:

Now we still have a few problems: the circles are shown for all the nodes, they are a bit too small, and the labels are not centered into them. To begin with, we are going to take care of the first two problems.

As we did for the distributions, the idea is to set an attribute on the "interesting" nodes, which can then be used by the Node shapes module to draw the node shapes only on these nodes. We could do this manually by selecting each node and adding the attribute, but to save time we can use a trick: we can use a Replace attribute module to add the new attribute to all nodes that satisfy a certain criterion, e.g. having an Index that is greater than 0. Since the "not interesting" nodes do not have an index associated to them, this criterion will only match the "interesting" nodes.

To do this, enable a new Replace attribute module by clicking on the Add module button under the Further transformations and selecting the module. Expand the options for this module and set the search Attribute to Index (this should cause the search Attribute type to automatically update itself to Number). Set the Value to 0 and the Comparison type to Greater than. Then, set the replacement Attribute to ShapeSize, the replacement Attribute type to Number and the replacement Value to 15. This attribute controls the size of the node shapes and, when you click on Apply, you should see that the circles at the interesting nodes immediately increase in size:

In particular, the circles are drawn in front of the labels, which is annoying; to fix this, you should move the Node shapes module up so that it is before the Labels module in the list. To make the smaller circles disappear altogether, you should go back to the Node shapes module and set the default value of the Size parameter to 0. In this way, the circles will only be drawn for those nodes whose ShapeSize attribute overrides the default shape size:

Now, to center the labels expand the options for the last Labels module (the one that draws the indices at the "interesting" nodes), and set X component of the position to 0 and the Alignment to Center. You can also make the font a bit bigger by clicking on the button next to Font and selecting the Bold font style, as well as increasing the font size to 12. The plot should now look similar to the following figure:

Adding a legend

The final step is to add a legend to explain what each "interesting" node represents. To do this, click on the Add module button under the Plot elements and select the Legend module. This will add a legend below the tree plot:

To update the text of the legend, expand the options for the new module, and click on the button to edit the Markdown source. In the new window, enter the following code:

### **Legend**

**1**: LCA of crown group Cyanobacteria

**2**: LCA of Macrocyanobacteria and Microcyanobacteria

**3**: LCA of _Pseudanabaena_ strains

**4**: LCA of Picocyanobacteria

**5**: LCA of _Richelia_ and _Calothrix_

The preview of the Markdown code will show a preview of how this code will be rendered. You can learn more about Markdown at the CommonMark website. Close the Markdown editor window by click on the OK button. The legend will update to reflect the new code that you entered:

You can move the legend so that it is aligned with the top-left of the tree by setting the Anchor to Top-left and the Alignment to Top-right, and then setting the X and Y components of the Position to -70 and 0, respectively. The final tree should look similar to the following figure:

You can now save the tree file or the plot as a PDF or SVG file using the items from the File menu. You can also download the clock.tbi tree file, which contains the tree along with all the modules. The finished tree file is also available in the Examples section of TreeViewer's welcome page in the File page.

Tips

You do not have to necessarily use TreeViewer's consensus tree for the plot. You could also use another program to compute the consensus tree, and open that file with TreeViewer. Then, instead of using the Set up age distributions module, you can use the Set up age distributions (attachment) module, which computes the age distributions based on the trees contained in an attachment.
Some molecular clock dating programs (e.g., MrBayes, RevBayes, Phylobayes) output the age estimates as multiple tree samples, as in the file used in this example. Other programs (e.g., MCMCtree), instead, produce a text file containing a table with the age estimates for each node. If you are using the second kind of program, you can plot the age estimates by adding the text file (e.g., mcmc.txt from MCMCtree) as an attachment, using the Set up age distributions (attachment) module, and setting the Format to Table.
You can also use a different colour for each "interesting" node.