Parallelization of multi start optimization - Data2Dynamics/d2d GitHub Wiki

In it's current state, Data2Dynamics only supports parallelization over experimental conditions, but not over multi-start optimization runs. An easy way to implement such a parallelization is by starting multiple instances of MATLAB on one machine and have each of them run a small batch of the full number of multi-start optimization runs. In this article, we want to introduce you to a straightforward way of implementing such a parallelization.

For this, we need two additional files: A bash script startup.sh that opens a new MATLAB instance using screen, and a MATLAB script doWork.m that does the actual fitting. Examples for both these scripts are provided below.

Let's first initialize D2D, load model and data, compile and save the workspace:


arInit;
arLoadModel('model');
arLoadData('data_for_model');
arCompileAll;

arSave('my_workspace');

Now, let's say we have four processor cores available and want to fit with 15 multi-start runs:

multistart_runs = 15;
parallel_instances = 4;

We can extend the number of multi-start optimization runs to the next larger multiple of parallel_instances = 4 without increasing the computation time. Store this and some other variables in a configuration struct conf that is later loaded by doWork.m:

conf.pwd = pwd;
conf.d2dpath = fileparts(which('arInit.m'));
conf.workspace = ar.config.savepath;
conf.parIn = parallel_instances; % number of matlab instances
conf.totNum = ceil(multistart_runs/parallel_instances)*...
    parallel_instances; % extend total number of multistart runs to the next multiple of parallel_instances without loss of computation time

save('parallel_conf.mat', 'conf');

Now call startup.sh:

for icall = 1:parallel_instances
    system(sprintf('cd %s; sh startup.sh %i', conf.pwd, icall));
end

I provided an example for startup.sh below. Since it contains the paths of MATLAB and D2D, make sure you put in the correct locations on your machine and that you have screen installed:

screen -d -m /Applications/MATLAB_R2019a.app/bin/matlab -nodisplay -r "addpath('~/Projekte/d2d/arFramework3'); doWork('$1'); exit;"

The function doWork.m does the actual optimization. Keep in mind that optimization in arFitLHS is implemented parallelized over the experimental conditions. You may want to turn this off to reduce computational overhead. To do so, uncomment the lines ar.config.useParallel = 0;and maxNumCompThreads(1);:

function y = doWork(icall)
    icall = str2num(icall);
    
    load('parallel_conf.mat', 'conf'); % load config 

    cd(conf.pwd);
    addpath(conf.d2dpath);
    
    arInit;
    arLoad(conf.workspace);

    % ar.config.useParallel = 0;
    % maxNumCompThreads(1);

    arFitLHS(conf.totNum/conf.parIn , icall); 
    % conf.totNum/conf.parIn is the number of fits each instance of MATLAB has to do
    % use icall as a random seed to make sure every instance fits for
    % different initial parameter vectors

    arSave(['par_result_' num2str(icall) '.mat'], 'ar');
end

After all the calculation is finished, there should be parallel_instances = 4 folders named *_par_result_* in Results/, each containing the results of ceil(multistart_runs/parallel_instances) = 4 multi-start runs. These can be conveniently collected by using arMergeFitsCluster('par_result'). You can access the collected results in the folder named Results/ClusterMerge_par_results_*.