Plotting the results of a stochastic mapping analysis - arklumpus/TreeViewer GitHub Wiki
This guide will provide instructions on how to plot the results of a stochastic mapping analysis using TreeViewer. Stochastic mapping (Huelsenbeck et al., 2003) is a method to reconstruct the history of a morphological character on a phylogenetic tree; unlike other ancestral state reconstruction methods, stochastic mapping reconstructs the history of the character at each point along the branches of the tree, and not only at internal nodes.
In order to plot the results of a stochastic mapping analysis, it is first necessary to actually run the analysis. There are a variety of programs that can be used to perform this kind of analysis; this tutorial will show how to run a simple analysis and export the results in a format that can be used by TreeViewer, using either sMap (Bianchini & Sánchez-Baracaldo, 2021) or the phytools
R package (Revell, 2012).
First of all, download the tree file Pontederiaceae.tre
. This is a rooted clock-like tree containing 24 plant species of the family Pontederiaceae; you can read more about this tree and the whole dataset in the sMap manual and in the original publications (Kohn et al., 1996 and Graham et al., 1998).
Expand the sections below for instructions on using sMap or phytools to run the stochastic mapping analysis. If you prefer, you can directly download the final stochastic mapping files, instead: flower_morphology.tre
and self_incompatibility.tre
.
Using sMap
Install sMap following the instructions in the sMap repository. What follows is essentially an abridged version of tutorial 5.9 in the sMap manual.
You now need to download the data file Pontederiaceae.txt
, containing data about two characters (flower morphology and self-incompatibility) for these species (note that the files for the sMap tutorial and phytools tutorial are different). Flower morphology is coded as a three-state character (with states T
, M
, and E
), while self-incompatibility is coded as a binary character (with states I
and C
). Note that the state for the flower morphology of Monochoria cyanea is ambiguous.
To set up and run the stochastic mapping analysis, open sMap-GUI (you can also use the command-line version, but using the GUI makes things easier to follow). Then, click on the rainbow sMap Wizard
button. Click on the Choose file...
button and select the Pontederiaceae.txt
file. Then, click on the new Choose file...
button that appears (next to Tree(s)
), and select the Pontederiaceae.tre
file. Click on Confirm
.
We now need to define the "dependency model" of the characters - i.e., whether we want the evolution of the two characters to be somehow connected or not. The sMap manual contains many tutorials dedicated to selecting the best evolutionary model; for this example, we will leave the two characters as independent. Thus, click Confirm
again. Then, click Confirm
one more time to use the default evolutionary model. Finally, click on the orange Run analysis
button to actually run the analysis.
Once the analysis finishes, close the analysis window, and click on the red button saying Save sMap run files...
. This will create a ZIP
archive containing two files: output.set0.smap.bin
(with data for the first character - i.e. flower morphology) and output.set1.smap.bin
(with data for the second character - i.e. self-incompatibility). We now need to convert these files in a format that can be understood by TreeViewer.
To do this, first of all extract the two files from the ZIP archive. Then, go back to the sMap-GUI program and close the wizard window. Then, click on the orange Merge-sMap
button, which will open a new window. Here, click on the Add sMap file...
button and select the first filel (output.set0.smap.bin
). Leave all settings to the default, then click on the Save merged sMap...
button. In the new window, select the output file type as Phytools-compatible file (*.tre)
. Now, remove the file by clicking on the red -
button next to it in the list, and repeat these steps for the second file (output.set1.smap.bin
).
You should now have two files in phytools-compatible format, each containing the history for a different character.
Using phytools
Make sure you have installed the phytools R package and all of its dependencies. What follows is essentially an abridged version of tutorial 5.9 in the sMap manual, done using phytools.
You now need to download the data file Pontederiaceae.txt
, containing data about two characters (flower morphology and self-incompatibility) for these species (note that the files for the sMap tutorial and phytools tutorial are different). Flower morphology is coded as a three-state character (with states T
, M
, and E
), while self-incompatibility is coded as a binary character (with states I
and C
). Note that the state for the flower morphology of Monochoria cyanea is ambiguous.
Now, open an R command line, and use the following code to load up the data and run the analysis:
library(phytools)
# Read the tree file
tree <- read.tree("/path/to/Pontederiaceae.tre")
# Read the data file
data <- read.table("path/to/Pontederiaceae.txt", header=TRUE)
# Data for the self-incompatibility - no ambiguous states, so this case is easier
incompatibility_data <- setNames(data$incompatibility, data$species)
# Run the analysis for the self-incompatibility
incompatibility_results <- make.simmap(tree, incompatibility_data, model="ARD", nsim=1000)
# Save the results for the self-incompatibility
write.simmap(incompatibility_results, file="self_incompatibility.tre", append=TRUE)
# Data for the flower morphology - since there is a species with an ambiguous state, we
# need to create a matrix to hold the data.
flower_data <- matrix(nrow=length(data$species), ncol=3)
# Set the row names of the matrix to the species, and the columns to the states
rownames(flower_data) <- data$species
colnames(flower_data) <- c("T", "M", "E")
# Fill the matrix
for (i in seq(1, length(data$species))) {
# Set all entries to 0
flower_data[data$species[i],] <- 0
# Iterate over all the states defined for a certain species
for (j in seq(1, nchar(data$flower[i]))) {
# Set the likelihood to 1
flower_data[data$species[i],substring(data$flower[i], j, j)] <- 1
}
}
# Run the analysis for the flower morphology
flower_results <- make.simmap(tree, flower_data, model="ARD", nsim=1000)
# Save the results for the flower morphology
write.simmap(flower_results, file="flower_morphology.tre", append=TRUE)
After running this code, you should now have two files, self_incompatibility.tre
and flower_morphology.tre
, containing the results of the stochastic mapping analysis for the two characters.
Regardless of how you obtained them, you should now have two files containing the reconstructed stochastic mapping histories for two characters on the Pontederiaceae tree.
If you open each of these files with TreeViewer, the program will automatically plot the results of the stochastic mapping analysis, using the default settings:
Flower morphology
Self-incompatibility
To begin with, let us work on the flower morphology plot. By default, the branches in the plot are plotted with multiple colours, with the width of each colour proportional to the posterior probability of the corresponding state at each point in time. For example, the branch leading to Eichornia paniculata starts with a ~50% probability of the M
(green) and T
(orange) states, and ends with the species having a 100% probability of being in the T
state.
While this representation summarises the results accurately, the plot may seem cluttered and it can be hard to interpret it "by eye" at first sight. Other options to show the branch states would be to just show the most probable state a posteriori at each point in time, or to show a mixture of the most probable states. To do this, click on the Plot actions
button in the Modules
tab to open the plot actions panel, then expand the options for the Stochastic mapping branches
module and change the value of the Style
parameter. The default value is All states
; if you set this parameter to Maximum a posteriori
, the branch colour will reflect, at each point in time, the most probable state:
This can be a bit misleading - for example, the root node has a probability of being in the M
state of around 40%, which causes the deep branches of the tree to be green, even though the T
state and E
state have about 30% probability each. To address this, you can set the Style
to Most probable states
; in this way, the colour of the branches will alternate between all the most probable states at each point in time:
You can determine exactly what the "most probable states" are by changing the value of the Dominance threshold
and Exclusion threshold
parameters. Essentially, if a state has a posterior probability higher than the Dominance threshold
, the branch colour corresponds exclusively to that state. If no state has a posterior probability higher than the Dominance threshold
, then the branch colour alternates between all states that have a posterior probability higher than the Exclusion threshold
. The Dash unit
parameter determines the length of the coloured dashes where colours are alternating; for example, you can increase this to 20
.
It would also be nice to add a legend to the plot, showing the meaning of the three colours; you could do this manually, by using a Legend
Plot action module, but you can also click on the Add legend
button in the settings for the Stochastic mapping branches
module, which will automatically add a legend with the state colours:
You can reposition the legend by opening the options for the Legend
Plot action module and setting both Anchor
and Alignment
to Bottom-left
, then setting the position's X
to -40
.
We can also replace the underscores in the species names with spaces, and make the species names italics. To do this, first open the Search panel by clicking on the Search
button in the Actions
tab (or pressing CTRL+F on Windows and Linux or CMD+F on macOS). Then, enter _
(i.e., underscore) in the Search
box (all the species in the tree will become highlighted) and enter
(i.e., a space) in the Replace with
box. Then, click on the Replace all
button. This will add an instance of the Replace attribute
Further transformation module, replacing underscores with spaces. Finally, go to the Plot actions panel, expand the options for the Labels
module, and click on the Font
button to set the font to italics. The plot shoud look like this:
You can repeat the same steps for the plot of the self-incompatibility character:
If you are using sMap to run the stochastic mapping analyses, you can use Merge-sMap to produce a stochastic mapping file containing data for multiple characters at once. To do so, open sMap-GUI and click on the orange Merge-sMap
button. Then, click on the Add sMap file...
button and select the first sMap output file that you produced earlier in the tutorial. Click on the Add sMap file...
button again, and select the other sMap output file; you should now have both files selected in the Merge-sMap window. Leave the other settings to their default value, and click on the Save merged sMap
button. Select Phytools-compatible file (*.tre)
as the output format and save the merged file. You can also directly download the file: merged.tre
.
If you now open this file in TreeViewer, the plot will reflect the states for both characters at once:
To separate the two characters, open the options for the Stochastic mapping branches
module, and click on the Wizard edit enabled characters
button. A new window will open, where you can select which characters are enabled in the plot. Make sure that only the first character is selected, and click OK; the plot will update:
Note that this has had no effect on the pie charts at the nodes - we will fix that later. For now, in the options for the Stochastic mapping branches
module, set the Style
to Most probable states
and the Dash unit
to 20
. Then, set the Branch thickness
to 4
(making the branches slightly thinner) and the X
and Y
coordinates of the Position shift
to 2
(which will move the branches downwards and to the right). The plot should now look like this:
We can now add the plot for the other character. The easiest way to do this is to duplicate the Stochastic mapping branches
module, and then open the options for the new module and click on the Wizard edit enabled characters
button. Here, make sure that only the second character is enabled. Then, change the X
and Y
coordinates of the Position shift
to -2
(so that the new branches do not overlap the old branches) and drag the new module so that it appears in the list before the Labels
module - in this way, the branches will not overlap with the pie charts. The plot should look like this:
This plot is a bit confusing, because the two characters use the same state colours. To change this, click on the Wizard edit state colours
button, and select a pink colour for the C
state and a dark blue colour for the I
state. The plot should now look like this:
We now need to fix the node state pie charts. First of all, open the options for the Coordinates module, increase the Height
to 550
, and click on Apply
(otherwise, the plot will be too busy). Then, open the options for the Node states
Plot action module and click on the Wizard edit enabled characters
button to enable just the first character. Set the Y
coordinate of the Position
to 4
(which will move the pie charts downwards) and the Width
and Height
to 8
(which will make them smaller). The plot should now look like this:
Then, duplicate the Node states
module, set the Y
coordinate of the Position
to -4
, and then click on the Wizard edit enabled characters
to enable just the second character, then on the Wizard edit state colours
to select the same colours for the states as before. The plot should now look like this:
We can now add the legend. Click on the Add legend
button of the first Stochastic mapping branches
module, and then open the options for the new module. Set the Font size
to 12
, the anchor and alignment to Bottom-left
, the X
coordinate of the position to -40
, the Y
coordinate of the position to 20
, and the Width
to 300
. Then, click on the Edit...
button to edit the Markdown source
of the legend, and paste the following code:
+-----------------------------+-----------------------------+
| Flower morphology | Self incompatibility |
+=============================+=============================+
| ![](circle://8,#E69F00) `E` | ![](circle://8,#CC79A7) `I` |
| | |
| ![](circle://8,#009E73) `M` | ![](circle://8,#0072B2) `C` |
| | |
| ![](circle://8,#56B4E9) `T` | |
+-----------------------------+-----------------------------+
The plot should now look like this:
Finally, we need to replace the underscores with spaces in the species names and make the tip labels italics. To do this, first open the Search panel by clicking on the Search
button in the Actions
tab (or pressing CTRL+F on Windows and Linux or CMD+F on macOS). Then, enter _
(i.e., underscore) in the Search
box (all the species in the tree will become highlighted) and enter
(i.e., a space) in the Replace with
box. Then, click on the Replace all
button. This will add an instance of the Replace attribute
Further transformation module, replacing underscores with spaces. Finally, go to the Plot actions panel, expand the options for the Labels
module, and click on the Font
button to set the font to italics. The plot shoud look like this:
You can now save the tree file or the plot as a PDF or SVG file using the items from the File
menu. You can also download the Pontederiaceae.tbi
tree file, which contains the tree along with all the modules.
-
You can change the state colours by using the
Wizard edit state colours
buttons in the options for theStochastic mapping branches
andNode states
modules. You can also remove the pie charts at the nodes in the tree by disabling theNode states
module. -
Naturally, you can add other elements to the plot, such as a scale axis or node age distributions.
-
You can also display the branch states in unrooted or circular style.
-
If you only want to display the pie charts at some nodes, you can do so by setting the default
Width
parameter of theNode states
module to0
, and then overriding this by setting a value for theStateWidth
attribute on the relevant nodes. -
Drawing multiple histories at the same time works much better when the data being plotted is binary presence/absence data. In this case, you can use a grey colour to indicate absence and a different colour to indicate presence of each character. For example:
Huelsenbeck, J. P., Nielsen, R., & Bollback, J. P. (2003). Stochastic Mapping of Morphological Characters. Systematic Biology, 52(2), 131–158. https://doi.org/10.1080/10635150390192780
Bianchini, G., & Sánchez‐Baracaldo, P. (2021). sMap: Evolution of independent, dependent and conditioned discrete characters in a Bayesian framework. Methods in Ecology and Evolution, 2041-210X.13540. https://doi.org/10.1111/2041-210X.13540
Revell, L. J. (2012). phytools: an R package for phylogenetic comparative biology (and other things). Methods in Ecology and Evolution, 3(2), 217–223. https://doi.org/10.1111/j.2041-210X.2011.00169.x
Kohn, J. R., Graham, S. W., Morton, B., Doyle, J. J., & Barrett, S. C. H. (1996). Reconstruction of the evolution of reproductive characters in Pontederiaceae using phylogenetic evidence from chloroplast DNA restriction-site variation. Evolution, 50(4), 1454–1469. https://doi.org/10.1111/j.1558-5646.1996.tb03919.x
Graham, S. W., Kohn, J. R., Morton, B. R., Eckenwalder, J. E., & Barrett, S. C. H. (1998). Phylogenetic Congruence and Discordance Among One Morphological and Three Molecular Data Sets from Pontederiaceae. Systematic Biology, 47(4), 545–567. https://doi.org/10.1080/106351598260572