Multi threading simulation - quasylab/sibilla GitHub Wiki
Simulation classes in Sibilla follow a multi-threading approach and are based on Java Concurrency API. Different scheduling policies are available to execute simulation tasks that can be tailored to fit with the parallelism of the hosting architecture.
A mechanism based on Factory Methods is used to set at the level of Runtime the appropriate SimulationManager.
Currently two kinds of implementation are considered:
- SequentialSimulationManager: simulation tasks are executed in the order they are submitted, a task is started only when the previous one has finished.
- ThreadSimulationManager: simulation tasks are executed following a multi-threading approach. Different scheduling algorithms can be used according to the used ExecutorService.
In the following the performance of SEIR Scenario with different parameters and simulation managers are considered.
We let the population size varies from 10 to 1000 individuals and the number of simulation runs ranging in 1, 10, 100, 1000.
The experiments are conducted on a MacBook Pro having 32Gb of RAM and a 2,3 GHz 8-Core Intel Core i9. Tests can be replicated by using the class SimulationManagerTest.
Sequential
Time to perform the simulation runs when a SequentialSimulationManager is considered is reported below:
10 | 100 | 1000 | |
---|---|---|---|
1 | 0.085 | 0.030 | 0.044 |
10 | 0.032 | 0.034 | 0.122 |
100 | 0.045 | 0.113 | 0.463 |
1000 | 0.081 | 0.516 | 4.214 |
We can observe that by using this simulation manager, the performance increases (almost) linear with the length of the simulation and (almost) exponentially with the size of the population.
ThreadSimulationManager and Fixed Thread Pool
When a ThreadSimulationManager is used, simulation is performed following a multighreading approach. In this case an ExecutorService is needed to handle the scheduling of the different threads. Below the resulsts of the simulation with a FixedThreadPool of different sizes is considered. This is a thread pool that reuses a fixed number of threads operating off a shared unbounded queue.
When can observe that if we increase the size of the pool, while remaining under the number of cores in the hosting machine, performance are getting better.
Thread pool size 2
10 | 100 | 1000 | |
---|---|---|---|
1 | 0.017 | 0.014 | 0.018 |
10 | 0.014 | 0.019 | 0.048 |
100 | 0.026 | 0.049 | 0.313 |
1000 | 0.060 | 0.352 | 2.936 |
Thread pool size 4
10 | 100 | 1000 | |
---|---|---|---|
1 | 0.014 | 0.014 | 0.019 |
10 | 0.015 | 0.017 | 0.035 |
100 | 0.022 | 0.040 | 0.224 |
1000 | 0.045 | 0.225 | 2.024 |
Thread pool size 6
10 | 100 | 1000 | |
---|---|---|---|
1 | 0.014 | 0.014 | 0.018 |
10 | 0.015 | 0.017 | 0.033 |
100 | 0.021 | 0.038 | 0.199 |
1000 | 0.041 | 0.201 | 1.807 |
Thread pool size 8
10 | 100 | 1000 | |
---|---|---|---|
1 | 0.013 | 0.015 | 0.017 |
10 | 0.016 | 0.017 | 0.034 |
100 | 0.020 | 0.036 | 0.189 |
1000 | 0.040 | 0.190 | 1.727 |
Thread pool size 10
10 | 100 | 1000 | |
---|---|---|---|
1 | 0.014 | 0.014 | 0.018 |
10 | 0.015 | 0.017 | 0.035 |
100 | 0.022 | 0.036 | 0.194 |
1000 | 0.041 | 0.198 | 1.862 |
ThreadSimulationManager and Cached Threads Pool
We can considered different Executors like, for instance, the CachedThreadPool. This is a thread pool that creates new threads as needed, but will reuse previously constructed threads when they are available. These pools will typically improve the performance of programs that execute many short-lived asynchronous tasks. We can observe that, in this case, this configuration outperforms both the sequential manager and the ThreadSimulationManager based on FixedThreadPool.
10 | 100 | 1000 | |
---|---|---|---|
1 | 0.017 | 0.016 | 0.022 |
10 | 0.018 | 0.020 | 0.037 |
100 | 0.033 | 0.038 | 0.178 |
1000 | 0.105 | 0.219 | 1.707 |
ThreadSimulationManager and Work Stealing Pool
Another option is the use of WorkStealingPool. This is a thread pool that maintains enough threads to support the given parallelism level guaranteed by the hosting machine, and may use multiple queues to reduce contention. We can observe that, by using this kind of executor service the performance are not better than the one obtained in the previous case by using cached thread pool, even if the two results are similar.
10 | 100 | 1000 | |
---|---|---|---|
1 | 0.019 | 0.019 | 0.027 |
10 | 0.020 | 0.023 | 0.040 |
100 | 0.029 | 0.043 | 0.203 |
1000 | 0.072 | 0.222 | 1.922 |