Parallelization and Checkpointing - Pas-Kapli/bpp-tutorial GitHub Wiki
Parallelization
Currently, BPP implements multithreading via pthreads which can substantially spead up the likelihood calculations.
Multithreading is enabled by specifying the threads
variable in the control file. The format is
Threads = N A B
, N is the number of threads, A is the starting core/thread number, and B is the increment. For example for Threads = 8 1 1
BPP will use 8 threads, starting from core/thread 1, with increment 1; in other words, it will occupy threads 1-8.
Checkpoint
Analyses with BPP can often last several days. In these cases, there is a high chance that the run might get interrupted (or surpass the time wall of a cluster). To avoid losing valuable output files we can use the checkpoint
parameter that allows resuming the analyses from where it was left.
The format is checkpoint X Y
This means that a checkpoint is created after X steps, and then additional checkpoints are created every Y steps.
Rerunning the yeast dataset with parallel computations
In your "A00_Yeast" dataset create a folder "parallel" and execute the control file with four threads and compare the running time with the analysis ran with a single thread.
seed = -1
seqfile = ../data/bpp_seqfile.txt
Imapfile = ../data/Imap.txt
outfile = out.txt
mcmcfile = mcmc.txt
speciesdelimitation = 0
speciestree = 0
species&tree = 5 Scer Spar Smik Skud Sbay
1 1 1 1 1
((((Scer,Spar)A,Smik)B,(H[&phi=0.600000,tau-parent=no],Skud)D)C, (Sbay)H[&phi=0.400000,tau-parent=yes])R;
usedata = 1
nloci = 106
cleandata = 0
model = JC69
thetaprior = 3 0.04 e
tauprior = 3 0.2
phiprior = 1 1
print = 1 0 0 0
burnin = 8000
sampfreq = 2
nsample = 100000
Threads = 4 1 1