BEAST on OpenStack - cdoorenweerd/PhylOStack GitHub Wiki

This HOWTO explains how to use BEAST v2.3.2 on OpenStack with Ubuntu 14.04 LTS.

Note: This HOWTO assumes you have installed the PhylOStack and know how to connect via SSH, transfer files and use screen sessions.

Modifying the BEAST and Treeannotator memory allocation

Although Beast and Treeannotator are installed with the PhylOStack, depending on the amount of RAM your instance has available you can adjust the executables to be able to use what is available. By default, BEAST and treeannotator are allowed to use up to 4 gb of RAM. Open the executables with a text editor:

sudo nano /usr/bin/BEASTv2.3.2/beast/bin/beast
sudo nano /usr/bin/BEASTv2.3.2/beast/bin/treeannotator

Scroll down to the line with:

$JAVA -Xms64m -Xmx4g -Djava.library.path="$BEAST_LIB:/usr/local/lib" -cp "$BEAST_LIB/launcher.jar" beast.app.beastapp.BeastLauncher $*

Xmx defines the maximum amount of RAM to be used, default 4g (4 gigabytes). Adjust this to your instance, e.g., if you have 16 gb available: Xmx16g.

Preparing input data

Preparing an input .xml file to run with BEAST is best done outside the Openstack environment on a local machine, using Beauti. Make sure your version of Beauti matches the version of BEAST you will use to run your data.

Enabling multithreaded runs

The customisation script installs the BEASTLabs add-on, enabling the use of multithreaded runs, which divides computations over multiple CPU cores. By default BEAST will use Beagle, including Beagle_SSE, which makes it faster than the regular version, but it will not make full use of all cores. Whether this is desired, depends on your dataset: the nature of the data (DNA or otherwise), the number and size of partitions and the linkage between partitions (see: http://beast2.org/performance-suggestions).

To use the threading option, open the .xml file generated by Beauti with a text editor and use find and replace [all]:

spec="TreeLikelihood"

with

spec="ThreadedTreeLikelihood" useJava='false'

Save and exit. The file is now ready for a multithreaded run, which is further defined in the run command.

Starting a run

Starting a regular run

Copy the input file to the instance, place it in a folder (e.g. beast_run1) and start a Screen session. You can now start a run by typing:

cd /path/to/beast_run1/
beast -beagle_SSE myinputfile.xml > screenoutput.log

Starting a multithreaded run

Copy the input file to the instance, place it in a folder (e.g. beast_run1) and start a Screen session. You can now start a run by typing:

cd /path/to/beast_run1/
beast -beagle_SSE -beagle_instances 4 -threads 4 myinputfile.xml > screenoutput.log

The threads value must always be equal or larger than the instances value. Try different values to see which provides the largest speed increase with your data. The processes appear to enjoy switching cores, so there should be room for that. For example: on a test dataset, using 4 beagle instances and threads on an 8 core instance provided a 30% speed increase, whereas using 8 beagle instances and threads resulted in a 5% speed decrease.

Run Results

You can follow the intermediate run results by opening another ssh session and using tail -f screenoutput.log or by copying the output files to a local machine and opening them with "Tracer":http://tree.bio.ed.ac.uk/software/tracer/ . When a run has finished, you can make a consensus tree of the resulting trees file with Treeannotator. Note that the burnin value is a percentage.

cd /path/to/beast_run1/
treeannotator -heights mean -burnin 20 yourtreesfile.trees finaltree.tre

The finaltree.tre can be opened with "FigTree":http://tree.bio.ed.ac.uk/software/figtree/ .