MCCL Parallel Processing - VirtualPhotonics/Vts.MonteCarlo GitHub Wiki

Running Simulations in Parallel

The ability to run a single Monte Carlo simulation across multiple CPUs can be invoked using the "cpucount" command line option. The usage to specify using 8 CPUs is:

mc cpucount=8 infile=myinfile.txt

Usage Notes

  1. If there are only 4 CPUs resident on the computer, four simulations will be started and the remaining four will be started as the first four finish.
  2. If the N, total photons to be launched, specified in the infile is not divisible by 8, the number of resulting photons that are run is floor(N/8)*8. The normalization of the results will be determined by this number instead of N.
  3. You can set "cpucount=all". It uses "Environment.ProcessorCount" to determine how many CPUs are available. This should only be done on a private computer. On a public cluster too many CPUs might be specified and it might take a long time to wait for that many to become available (see timing results below).

Technical Information

  1. To ensure that the random number generator used on the different CPUs generates streams that are uncorrelated, we employ the Dynamic Creator Mersenne Twister (https://github.com/MersenneTwister-Lab/dcmt). This Dynamic Creator Mersenne Twister (dcmt) finds sub-streams within the original Mersenne Twister (MT) that do not overlap and are long enough to encompass all random numbers needed in a simulation.
  2. The dcmt code was written in C and we ported it to C#. The details of the code can be seen by cloning a copy of our source code. Comments were added that describe the code when known.
  3. We ran timing studies on the University of California, Irvine High Performance Cluster (https://rcic.uci.edu/hpc3/hpc3.html). The sample infile=infile_one_layer_ROfRho_FluenceOfRhoAndZ.txt was used in all of the simulations. The number of photons launched N was set to 1e5, 1e6 and 1e7 and cpucount was set to 1, 2, 4 and 8 (16 was tried however it took a long time before 16 were available). We obtained the following timing results in seconds:
N cpucount=1 cpucount=2 cpucount=4 cpucount=8
1e7 38901 17779 17151 16980
1e6 3051 1815 1698 1350
1e5 448 178 184 208