Drawing an alignment with the tree - arklumpus/TreeViewer GitHub Wiki

This guide will provide instructions on how to draw a phylogenetic tree that includes a sequence alignment of the same sequences that were used to build the tree. Including an alignment and a phylogenetic tree in a single plot can be useful to highlight sequence features that the tree alone cannot convey effectively, such as amino acids in the active site of an enzyme, or particularly relevant indels.

Consider for example the case of the sepJ gene, which encodes a protein that is important for multicellularity in cyanobacteria. This protein contains three main domains: a coiled-coil motif at the N-terminus (CC), a permease domain at the C-terminus (P) and a central linker (L) domain between the two. However, not all cyanobacteria are multicellular and, accordingly, some of them possess this gene, others possess a "shortened" version without the L domain, and others only have a distant homolog, which is an uncharacterised drug/metabolite exporter (DME) and only encodes the P domain.

In this tutorial, we will show a sequence alignment together with a phylogenetic tree of sepJ, which will allow us to immediately highlight the difference between the various kinds of sepJ homologs.

The file sepJ.tre contains a rooted phylogenetic tree of 44 sepJ homologs (adapted from Urrejola et al., 2020). When the tree is opened in TreeViewer, it should look similar to the following figure:

Cleaning up the tree

First, in this case we do not really want to show the branch length labels, both because they do not provide much information, and because some branches are so short that the branch lengths overlap each other. Therefore, you can start cleaning up the tree for display by removing the second Labels module that was added to the Plot elements when TreeViewer opened the tree file (click on the Plot actions button in the Modules tab to show the Plot action modules). This will delete the branch length labels from the plot.

We can also make the tree more compact by opening the options for the Coordinates module and setting the Width to 400. The tree should now look similar to the following figure:

Adding the alignment to the tree

The sepJ.fas file contains an alignment in FASTA format of the protein sequences that were used to build the tree. This alignment can be embedded with the tree by loading it as an attachment; to do this, click on the Add attachment button in the Attachments tab, then select the file and confirm. This will cause sepJ to be shown with a paperclip icon in the Attachments tab.

This step added the alignment file to the tree, but did not actually plot the alignment. To plot the alignment, we need to use the Plot alignment Plot action module. To enable this module, click on the Add module button under Plot elements and select it from the list. Once the module has been added (do not worry if a warning sign appears, this is because no alignment file has been selected yet), expand the options for it and set the FASTA alignment parameter to the sepJ attachment.

This will cause an alignment plot to be drawn just below the phylogenetic tree; the tree should look similar to the following figure:

We now need to position the alignment plot so that each sequence is displayed next to the tip of the tree to which it refers.

To do this, change the value of the Mode parameter to Sequences at nodes. Each sequence will now be drawn at the node to which it corresponds, but the sequences will not be aligned anymore (and they will overlap the node labels):

To position them correctly, you should change the Anchor type to Origin (which will cause the sequences to be aligned at the left side of the plot) and then set the X component of the position to a suitable value (e.g., 650). Finally, to reduce the amount of white space between the sequences, increase the value of the Sequence height parameter to 12.

The tree plot should now look similar to the following figure:

Highlighting the sequence groups

This tree makes it easy to see which sequences only have the P domain (the DME sequences), which ones have the P domain and the CC domain (the Pseudanabaena and the first group immediately after them), and which ones have all three domains (all the others). However, it would be nice to highlight these differences on the tree so that they are even clearer.

The first step to highlight these sequences would be to change their colour. To do this, you can start by selecting the branch corresponding to the common ancestor of all DME sequences by clicking on it. This will open the Selection panel on the right. Here, you can open the attributes section and click on the Add attribute button to add a new attribute. Set the Attribute name to Color and leave the Attribute value empty. This will add a new Add attribute module to the Further transformations. Click on the Further transformations button in the Modules tab and expand the options for the new module; first of all, check the Apply recursively to all children check box, then click on the New value text box (as if you wanted to enter some text in it). Now, press CTRL+SHIFT+C (on Windows and Linux, or CMD+SHIFT+C on macOS) on your keyboard; this will open a colour picker window in which you can select a new colour for these sequences. Select an orange hue, then click on OK to close the colour picker and click on Apply to apply the new colour.

This should change the colour of the selected sequences, and the tree plot should now look like the following figure:

Now, select the branch corresponding to the last common ancestor of the Pseudanabaena sepJ sequences and repeat the same steps to assign a green colour to it. The tree should now look similar to the following figure:

Then, click on the next group that diversifies (i.e. the last common ancestor of the sequences from sepJ Geitlerinema sp. FC II to sepJ Spirulina subsalsa PCC 9445 2) and repeat the steps above to assign a light blue colour to it. The tree should look similar to the following figure:

Finally, click on the last common ancestor of the group of "full" sepJ sequences (i.e. the last common ancestor of the sequences from sepJ Nodularia sp. NIES-3585 to sepJ cyanobacterium PCC 7702 genomic). Again, follow the steps above to assign a dark blue colour to it:

To make these colours show up in the sequence alignment as well, expand the options for the Plot alignment module. Go in the Colours section, and click on the wrench button next to the Colour parameter. In the Colour formatter window that opens, set the Attribute name to Color and click on OK. The colours should now have been applied to the alignment and the plot should look similar to the following figure:

Displaying group names

The final step is to display the names of these groups that we have just highlighted. First of all, we need to create new attributes to store the group names. To do this, select the branch corresponding to the common ancestor of the orange sequences and add a new attribute to it, called Domains and with value P only. You should then add a new Domains attribute to the green and light blue groups, with value CC+P for both. Finally, add a Domains attribute to the ancestor to the dark blue group, with value CC+L+P.

To display these labels, we need to use the Group labels Plot action module. To enable this module, click on the Add module button under the Plot elements and select the module. Then, open its options and set the Attribute to Domains. This should cause the labels to appear on the tree (although they will be in the wrong position):

To position the labels appropriately, set the Distance to 820; then, click on the button to change the Font and increase the font size to 18:

Finally, we can add another set of group labels to highlight which sequences are actual sepJ homologs and which sequences belong to the unidentified DME. To do this, select again the ancestor of all the DME sequences and add to it an attribute called Group with value DME. Then, select the last common ancestor of all the sepJ sequences (i.e. the sister group to the one you just selected) and add another attribute called Group with value sepJ.

Now, add another Group labels Plot action module by clicking on the Add module button under the Plot elements. In the options for this module, set the Attribute to Group, and the Distance to 850. The tree should now look similar to the following plot:

To make these labels look prettier, first of all click on the Font button and select the Bold Italic style and increase the font size to 24. Then, increase the Height to 40 and set the Fill colour to a light grey (e.g. #B4B4B4). Finally, click on the wrench button next to the Colour in the Text options and in the new window set the Default colour to white and the Attribute name to N/A. This should cause the text to become white. The final tree plot should look similar to the following figure:

You can now save the tree file or the plot as a PDF or SVG file using the items from the File menu. You can also download the sepJ.tbi tree file, which contains the tree along with all the modules. The finished tree file is also available in the Examples section of TreeViewer's welcome page in the File page.

Tips

  • The different Modes of the Plot alignment module can be useful in different situations. For example:

    • The default settings are useful to show an overview of the full alignment next to the tree.
    • Showing unaligned sequences at the tips of the tree is useful to highlight the variability of interesting features (e.g., active sites or conserved regions).
    • Aligned sequences at the tips of the tree (like in this example) are useful to compare the overall structure of the alignment between the different taxa.
  • If you wish to make the font sizes bigger, you can increase the vertical spacing of the tips of the tree by increasing the Height in the options for the Coordinates module. You may want to adjust the Sequence height in the options for the Plot alignment module as well.

  • Instead of drawing each sequence with a single colour, you can choose to colour each position based on the nucleotide/amino acid that is in it. To do this, change the Colour mode to By residue and then click on the button corresponding to the kind of sequence data that is included in the alignment (DNA/RNA or Protein). Note that this will be slower! You can also define custom colours by clicking on the wrench icon next to the Residue colours parameter.

  • You can also decide to draw the letters of the alignment, by checking the Draw residue letters check box. Again, this will be much slower; you also need to make sure that the Residue width is high enough, otherwise there will not be enough space to read the letters.

  • Colouring each residue in the sequence or displaying the alignment letters is particularly useful for short sequence stretches (e.g. active site or other conserved positions in which you want to highlight a synapomorfism or an autapomorfism). You can select which part of the sequence is shown (i.e. the start and end nucleotide/aminoacid) by changing the Start and End parameters. If you wish to show two disjoint ranges of residues, you will need to use two instances of the Plot alignment module.

References

Catalina Urrejola, Peter von Dassow, Ger van den Engh, Loreto Salas, Conrad W. Mullineaux, Rafael Vicuña, Patricia Sánchez-Baracaldo, Loss of Filamentous Multicellularity in Cyanobacteria: the Extremophile Gloeocapsopsis sp. Strain UTEX B3054 Retained Multicellular Features at the Genomic and Behavioral Levels, Journal of Bacteriology, 202(12), 2020. https://doi.org/10.1128/JB.00514-19.