Parallelization and Checkpointing - Pas-Kapli/bpp-tutorial GitHub Wiki

Parallelization

Currently, BPP implements multithreading via pthreads which can substantially spead up the likelihood calculations.

Multithreading is enabled by specifying the threads variable in the control file. The format is

Threads = N A B, N is the number of threads, A is the starting core/thread number, and B is the increment. For example for Threads = 8 1 1 BPP will use 8 threads, starting from core/thread 1, with increment 1; in other words, it will occupy threads 1-8.

Checkpoint

Analyses with BPP can often last several days. In these cases, there is a high chance that the run might get interrupted (or surpass the time wall of a cluster). To avoid losing valuable output files we can use the checkpoint parameter that allows resuming the analyses from where it was left.

The format is checkpoint X Y

This means that a checkpoint is created after X steps, and then additional checkpoints are created every Y steps.

Rerunning the yeast dataset with parallel computations

In your "A00_Yeast" dataset create a folder "parallel" and execute the control file with four threads and compare the running time with the analysis ran with a single thread.

      seed =  -1

      seqfile = ../data/bpp_seqfile.txt
      Imapfile = ../data/Imap.txt
      outfile = out.txt
      mcmcfile = mcmc.txt

      speciesdelimitation = 0
      speciestree = 0

      species&tree = 5 Scer Spar Smik Skud Sbay
                        1    1    1    1    1
                   ((((Scer,Spar)A,Smik)B,(H[&phi=0.600000,tau-parent=no],Skud)D)C, (Sbay)H[&phi=0.400000,tau-parent=yes])R;


      usedata = 1
      nloci = 106  
      cleandata = 0
      model = JC69

      thetaprior = 3 0.04 e
      tauprior = 3 0.2
      phiprior = 1 1

      print = 1 0 0 0
      burnin = 8000
      sampfreq = 2
      nsample = 100000
      Threads = 4 1 1