Plotting multiple age distributions - arklumpus/TreeViewer GitHub Wiki
As has been shown in another tutorial, TreeViewer can be used to plot posterior age distribution estimates along with the tree. Furthermore, TreeViewer can be used to plot multiple age distributions on the same tree; this is useful, for example, to compare the prior and posterior estimates, or to compare ages obtained with different molecular clock models.
It is recommended that you familiarise yourself with the tutorial about Plotting the age distributions in a time‐calibrated tree before continuing with this tutorial.
Plotting the posterior estimates
The file primates.nex
contains a molecular clock tree estimated using MCMCtree. This was producing by running "Tutorial 1" from the MCMCTree tutorials (dos Reis et al, 2017) (except that I changed the RootAge
calibration in the mcmctree.ctl
file to <0.3
instead of <1
). This file contains a single tree with branch lengths derived from the mean age estimates, and confidence intervals annotated as attributes on the tree.
If you open it in TreeViewer, it should look like the following:
First of all, we can remove the branch length labels, as we do not really need them. To do this, click on the Modules
tab at the top of the interface, then click on Plot actions
to open the plot actions panel on the left, and click on the x
next to the second Labels module in the list in order to remove it.
To plot the posterior age distribution estimates, you need first of all to add the posterior.txt
file as an attachment to the plot. To do this, click on the Attachments
tab and on the Add attachment
button, and select the file. Leave the attachment settings to their default values. This file is the mcmc.txt
file produced by the MCMCtree analysis (just renamed).
Now, you can set up the age distributions by clicking on the Modules
tab, then on the Further transformations
button, and adding an instance of the Set up age distributions (attachment) module. Expand the options for this module, and set the Attachment
to posterior
(the new attachment we have just added) and the Format
to Table
, then click on Apply
. The plot should stay the same.
To actually plot the age distributions, you can click on the Plot actions
button to bring up the plot actions panel, click on the Add module
button, and add an instance of the Age distributions plot action module. The plot will be updated and should now look similar to the following (the colours may be different):
In the left panel, you can now drag the Age distributions module to the top, so that it appears before the Branches in the order. This will cause the age distributions to be drawn behind the branches. Then, expand the options for this module and uncheck the Auto colour by node
option (this will cause the age distributions to become black). Finally, click on the Colour
button and choose a dark blue; the plot should now be similar to the following:
Adding the prior estimates
We have just plotted the estimates for the posterior distributions of the node ages; when dealing with a Bayesian molecular clock analysis, there are at least three different distributions on node ages that should be considered:
- The posterior distribution (which we have just plotted).
- The "user" prior distribution, which consists of the node age priors (e.g., fossil calibrations) specified by the user.
- The "effective" prior distribution, which is the user prior conditioned on the fact that the ages must be compatible with a phylogenetic tree (which means, for example, that you cannot have a node be older than its ancestor).
The posterior distribution can be obtained by running a "normal" analysis, the user prior is known because it consists just of user input, and the effective prior can be obtained by running an analysis without any data (i.e., without computing likelihoods). Most software for molecular clock dating will have an option for running the analysis without computing likelihoods; for example, in Phylobayes this is achieved with the --prior
command-line option, in MCMCtree by setting usedata = 0
in the control file, in MrBayes by using the data=no
option in the mcmc
command, and so on.
The file prior.txt
contains the effective prior age estimates for this tree (again, this is just the mcmc.txt
file produced by an MCMCtree run under the prior, just renamed). You can add this as an attachment to the plot, then go to the Further transformations
tab and add another instance of the Set up age distributions (attachment) module. Expand the option for this module, and make sure you set the Attachment
to prior
; then, change the Format
to Table
and clik on Apply
.
The plot should now update and look similar to the following:
This is because the Age distributions module is now plotting the last age distributions that we have set up, i.e., the prior. First of all, we need to get back the posterior estimates on the plot; to do this, click on the Plot actions
button in the Modules
tab to go back to the Plot elements panel, and expand the options for the Age distributions module. Here, set the Age distribution
parameter to Custom
and, in the Distribution name
text box that appears, enter posterior
. The plot should now go back to showing the posterior age estimates.
When you add an instance of the Set up age distributions (attachment) module, the age distributions are stored with a name that is by default equal to the name of the attachment used to compute the age distributions (in our case, posterior
and prior
). When you set the Age distribution
parameter of the Age distributions plot action module to Custom
(as we have just done), you can enter the name of a distribution to select it for plotting. You will have noticed, while typing the name of the new distribution, that a warning message is shown if an invalid name is entered, listing the names for the available distributions.
To plot the prior age distribution alongside the posterior distributions, you need to add another instance of the Age distributions plot action module. When you do this, the plot should update to look similar to the following (again, the colours may differ):
The prior distributions are now being drawn over the posterior distribution; this does not look very nice, though. To improve the appearance of these distributions, first of all, drag the new Age distributions module above all the other modules in the list, so that it appears before the module that is plotting the posterior age distributions. Then, expand the options for this module and set the Age distribution
to Custom
(the Distribution name
box will appear and will already have the correct value of prior
; this is so that the we do not have to change this again after we add the user prior distributions). Now, uncheck the Auto colour by node
option (the prior age distributions will turn black) and click on the Colour
button to increase the transparency of the colour (leave the colour black, but set the alpha A
value to 64
). The plot should now look similar to the following:
To reduce the overlap between the age distributions for different nodes, you can click on the Coordinates module
button in the Modules
tab, then expand the parameters for the Coordinates module, and set the height to 300
. The plot should now be similar to the following (unfortunately, there is still a bit of overlap for the two nodes at the bottom, but that is unavoidable):
Adding the user-specified prior
We have now plotted the posterior and effective prior; we just need to add the user prior. In the MCMCtree tutorial, we used three calibration points:
- A
>.06<.08
calibration (soft bounds uniform distribution between 0.06 and 0.08) on the last common ancestor (LCA) ofhuman
,chimpanzee
andbonobo
. - A
>.12<.16
calibration (soft bounds uniform distribution between 0.12 and 0.16) on the LCA of great apes (all the taxa except forgibbon
). - A
<0.3
calibration (soft upper bound of 0.3 - modified from the original tutorial, where this was<1.0
) for the root node of the tree.
First of all, we need to annotate these calibrations on the tree. The easiest way to do this is to select the relevant nodes (e.g., the LCA of human
, chimpanzee
and bonobo
) and click on the Attributes
button in the Selection actions
tab to show the attributes for this node in the panel on the right. Then, click on the Add attribute
button and, in the new dialog window, enter Calibration
as the Attribute name
and the value of the calibration (e.g., >.06<.08
) as the Attribute value
. Then, click OK
and repeat these steps for the other two calibrations.
Every time you do this, TreeViewer will automaticall add an instance of the Add attribute Further transformation module, adding the value of the attribute to the specified nodes. Now that the calibrations are annotated on the tree, we need to tell TreeViewer to read them and set up the corresponding prior age distributions. To do this, click on the Further transformations
button in the Modules
tab, then add a new instance of the Parse age distributions module. Expand the options for this module and set the Attribute
to Calibration
(the new attribute we have just created), and the Name
to something like user prior
(we will need this to plot the distributions), then click on Apply
. The plot should not change.
To add the new distributions to the plot, go back to the Plot action modules, and add yet another instance of the Age distributions module. The module will automatically select the latest age distributions that have been added on the tree, and the plot should update to look similar to the following:
The overlap is a bit confusing, so first of all move the new Age distributions module up in the list so that it appears behind the branches, but in front of all the other age distributions (i.e., it should be the third module in the list). Then expand the options for this module and set the Age distribution
to Custom
(the Distribution name
should automatically be shown as user prior
). Now, set the Opacity
of the Fill colour
to 0%
; this will cause the new age distributions to temporarily disappear - this is normal. At this point, you can disable the Auto stroke colour by node
option and increase the line weight to 1
; this will cause the distributions to be outlined on the tree, like the following:
Finally, change the Line colour
to a mid-grey and set the Line dash
to a dashed line. The plot should now look like the following:
Interpreting the plot
We can now compare the three distributions: the user prior (dashed line), the effective prior (grey distribution) and the posterior (blue distribution).
-
For the LCA of
human
,chimpanzee
, andbonobo
, the user prior and effective prior are practically the same; the posterior is well contained within the prior and shows higher density towards the middle of the interval. -
For the LCA of great apes, the effective prior has slightly less density towards the higher end of the interval, compared with the user prior; this is because this calibration is interacting with the root node prior: since this node cannot be older than the root node, some age combinations for this node are excluded because of the root node calibration potentially allowing any age lower than 0.3 for the root node. The posterior estimate puts most of the density at the higher end of the prior interval (i.e., it looks "squished" to the left). This is interesting, because it seems to indicate that the data want this node to be potentially older than what we are constraining it to be with the prior calibration on the node.
-
For the root node, the user prior (calibration) allows any age from 0.3 to 0, plus a smaller tail for values higher than 0.3 due to the soft bound. The effective prior, however, does not show any density for values lower than 0.12. Again, this is because of the interaction between the root calibration and the calibration on the LCA of great apes: since the latter cannot be younger than 0.12, and since the root node must be older than that, ages younger than 0.12 for the root node are not possible, and ages between 0.16 and 0.12 have increasingly lower probabilities. The posterior distribution for the root node is located at the younger end of the (effective) prior interval, which indicates that the data seem to say that there should be a very short interval between the divergence of
gibbon
from great apes and the divergence of orangutans (orangutan
+sumatran
) from other great apes (human
,chimpanzee
,bonobo
, andgorilla
).
This information that we gathered just by looking at the age distributions leads to some conclusions about the calibrations and the data:
-
If we removed the lower bound on the LCA of great apes (i.e., set the calibration as
<.16
), the posterior estimates would not change significantly. -
On the other hand, if we were to allow older ages for this node (e.g.,
>.12<.25
), we would see that the posterior age estimate moves towards the left. -
If we were to use a calibration allowing older ages for the root node (e.g.,
<1.0
, as in the original MCMCtree tutorial), the results would not change. -
If we completely removed the calibration on the LCA of
human
,chimpanzee
, andbonobo
, the posterior estimates would again not change significantly. -
We can go even further: if we removed every calibration, except for the upper bound on the LCA of great apes, leaving the calibration for this node as
<.16
(and keeping the root node calibration at<1.0
, because the program needs it to run), we would still get essentially the same results.
You could confirm these hypotheses "experimentally", by downloading the files for the MCMCtree tutorial and modifying them. This highlights that the only fossil calibration that we are actually using in the analysis is the upper bound on the LCA of great apes, and all other information comes just from the alignment data. Thus, if an amazing new study were to come out and challenge any of the other calibrations, we could still breathe a sigh of relief knowing that it should not affect our results too much.
Generalising, there are a few inferences that can be made by looking at this kind of plot (though, of course, it would be good to verify each case individually by running additional analyses, before relying on any of these deductions):
-
It is normal for effective priors to be different from the user-specified priors, especially if you have many calibrations in the tree. Though, be careful if the prior looks excessively weird: you may have specified conflicting calibrations (e.g., ancestors younger than their descendants), or you may have encountered a bug in the software.
-
If the posterior samples for a node all fall in a relatively small range well within the effective prior, it is likely that the data is very informative for that node. If the node has a calibration, removing that calibration would likely not change much.
-
If the posterior is "squished" to one side of the prior, and/or has most density in parameter ranges where the prior has low density, there is likely to be something that is "conflicting" with the data (either on the same node, or on other nearby nodes). It may be worth investigating this: has the analylsis converged? Is the prior too informative? Is the fossil calibration on the right node? Did I accidentally include a mis-labelled sequence? And so on.
Finishing touches
Going back to the plot, it would be useful to include a scale axis, so that it is possible to estimate the actual ages of the nodes, as opposed to just looking at the nice distributions. To do this, you can click on the Plot actions
button in the Modules
tab to bring up the Plot elements panel on the left, and add a new instance of the Scale axis module. The plot should now look similar to the following:
The axis does not look great because of all the overlapping numbers; you can make it prettier by setting the End
option to 0.4
and the Tick spacing
parameter to 0.025
. The plot should now look like this:
The units in the scale axis are the same as in the tree produced by MCMCtree. All the calibrations were given in units of 100 Mya; thus, for example, 0.30
means 30 million years ago. It would be nice to have these numbers on the tree, rather than the ones we have now.
To do this, we need to scale the tree by a factor of 100; mostly importantly, we need to scale all three age distributions by the same factor, otherwise they will be placed at the wrong age on the tree. The Set up age distributions (attachment) and Parse age distributions modules have a Scaling factor
parameter that is meant just for this: if you set this to 100
for all three modules we have used, each distribution will be scaled by this factor. However you need to make sure that the Apply scaling to transformed tree
option is enabled only for one of the three modules (it does not matter which one), otherwise the final tree will be scaled by a factor of 100
by each module, and it will end up being scaled by a factor of 1 000 000!
Once you do this and click the Apply
button on each module, the tree and the age distributions will be scaled, but the scale axis will not, and it will end up being squished to the right of the plot:
To fix this, you need to go back to the plot elements, expand the options for the Scale axis module, and set the End
to 40
(i.e., $0.4 \cdot 100$) and the Tick spacing
to 2.5
(i.e., $0.025 \cdot 100$). You can also set the Digits
to 0
, since all the values on the axis should be integers and we do not need decimal digits anymore. Finally, you can set the Units
to Mya
and drag the Scale axis module all the way up, so that it appears behind all the other plot elements:
Another nice touch would be to have actual species name in place of human
, chimpanzee
, etc. Rather than renaming each taxon manually, we can use an attachment to store the species names and then annotate them as an attribute on the tree. To do this, click on the Attachments
tab, and then click on the lower half of the Add attachment
button. This will open a context menu, from which you can click Open spreadsheet editor
to open a spreadsheet editor window. In the spreadsheet, you can create a table like the following (note that you should be able to copy and paste it from this page):
Name | Species |
---|---|
human | Homo sapiens |
chimpanzee | Pan troglodytes |
bonobo | Pan paniscus |
gorilla | Gorilla gorilla |
orangutan | Pongo pygmaeus |
sumatran | Pongo abelii |
gibbon | Hylobates lar |
You should start from the top-left corner of the spreadsheet (i.e., cell A1
should contain Name
). An important thing to note is the Column separator
, which is shown in the Edit
tab of the spreadsheet window. The default value is \t
; we will need this value when we tell TreeViewer how to interpret the data in the attachment.
Once you are done, click OK
and enter species
as the name of the attachment (leave all the other options to their default values). To annotate the species names on the tree, click on the Further transformations
button in the Modules
tab, and add a new instance of the Parse node states module. Expand the options for this module and set the Data file
to the new attachment and the Separator
to \t
(as we just noted), then enable the Use first row as header
option.
The value of the Separator
should correspond to the value used in the attachment (depending on where you got the attachment from, other symbols could be used, like a comma ,
or a semicolon ;
). If you are unsure, in most cases you can just open the attachment using the Spreadsheet editor
option; the spreadsheet window will automatically identify the column separator and display it in the Edit
tab.
You can click on Preview
to see if everything will be interpreted correctly, then click on Apply
to add the attribute on the tree. Now, if you click on one of the terminal nodes of the tree (e.g., gorilla
) and look at its attributes in the right panel, you will see that a new attribute called Species
containing the species name (e.g. Gorilla gorilla
) has been added.
To display this attribute instead of the common name, you should go to the plot elements panel and expand the settings for the Labels module; here, you can set the Attribute
to Species
. Since these are scientific names, it would be appropriate for them to be in italics; you can change this by clicking on the Font
button and selecting the Italic
font style. The plot should now look similar to this:
We can now add a legend describing the three kinds of age distributions that we have plotted on the tree. To do this, click on the Add module
button to add a new Plot action module, and select the Legend
module. A legend will appear at the bottom of the tree:
To change the contents of the legend, open the options for the new module and click on the Edit...
button for the Markdown source
. Here, you can enter the following Markdown legend:
![](circle://8,#0072B2) Posterior age distribution estimates
![](circle://8,#00000040) Effective prior age distribution estimates
![](rect://3,2,#808080) ![](rect://3,2,#808080) ![](rect://3,2,#808080) User-specified prior
You can then change a few settings to make the legend smaller and move it to a better place: you can set the Font size
to 10
, the Width
to 130
, the Anchor
to Top-left
, the Alignment
to Top-right
, the position X
to -50
and the position Y
to 0
. Finally, click on the Background colour
button and make it completely transparent by setting the A
component to 0
. The plot should now look similar to the following:
Finally, we can add some images to illustrate the taxa in the tree. The archive images.zip
contains a (slightly creepy) image for each of the taxa in the tree, which were generated using the Stable Diffusion Online AI model (hence, they may not be completely accurate).
To add these on the plot, download the ZIP file and extract it somewhere on your computer. Then, click on the Attachments
tab and add each image file as an attachment to the tree. Then, click on the Plot actions
button in the Modules
tab to open the Plot elements panel, and click on the Add module
button to add an instance of the Draw image module. Expand the options for this module, and set the Image
parameter to the first image file (e.g., human
), then change the Image format
to PNG
. The image should now appear over the root node in the tree:
To position the image correctly, click on Homo sapiens
in the tree to select the correct branch, and then click on the Use selection
button in the Position
section of the parameters for the Draw image module. The image will then move to the Homo sapiens node. You can move it further to the right by setting the X
parameter to 110
, and make it smaller by setting both Width
and Height
to 40
. The plot should now look similar to the following:
You can now duplicate the Draw image module by clicking on the button with the two squares between the ?
and the x
, expand the options for the duplicated module, click on Pan troglodytes
in the tree and click on the Use selection
button to move the second image in the right place. Then, change the Image
to chimpanzee
. You can repeat these steps to add all the images next to their respective taxon names in the tree. The finished plot should look similar to the following:
You can now save the tree file or the plot as a PDF or SVG file using the items from the File
menu. You can also download the primates.tbi
tree file, which contains the tree along with all the modules and the images.
References
- dos Reis, M., Álvarez-Carretero, S., & Yang, Z. (2017). MCMCTree tutorials. http://abacus.gene.ucl.ac.uk/software/MCMCtree.Tutorials.pdf