Single‐Thread and Multi‐Thread Benchmark and Performance Test - weria-pezeshkian/FreeDTS GitHub Wiki

Single‐Thread and Multi‐Thread: Benchmark and Performance Test

By Adrià Bravo Vidal.

FreeDTSv2.x can execute simulations both in single-threaded mode and in parallel using OpenMP. Users can select which Monte Carlo (MC) move types are executed in parallel via input parameters. Understanding how performance scales with thread count is crucial for determining how to use available computational resources. Here, we assess the strong and weak scaling performance of FreeDTSv2.1.

Test set-up

All tests were conducted on a flat membrane with periodic boundary conditions and zero frame tension ($\tau=0$), with 10% of the vertices occupied by inclusions. This setup involves four distinct MC moves—Vertex Position Update, Inclusion Position Move, Edge Move, and Box Move—making it one of the most computationally demanding simulations in FreeDTS.

The input file used for the simulations is:

Integrator_Type = MC_Simulation 
Min_Max_Lenghts = 1 3
MinfaceAngle = -0.5
Temperature = 1.0 0
Set_Steps = 1 100
EnergyMethod = FreeDTS1.0_FF
Seed = 52089
Kappa = 20 0 0
VertexArea = 0.0 0.7 0 0
TimeSeriesData_Period = 100
VertexPositionIntegrator = MetropolisAlgorithmOpenMP 1 1 0.05
AlexanderMove = MetropolisAlgorithmOpenMP 1
InclusionPoseIntegrator = MetropolisAlgorithmOpenMP  1 1
Dynamic_Box = IsotropicFrameTensionOpenMP 2 0 XY
VisualizationFormat = VTUFileFormat VTU_F 50000
NonbinaryTrajectory = TSI TrajTSI 2000
 Restart_Period = 50000
INCLUSION
 Define 1 Inclusions
 SRotation   Type   K   KG  KP  KL  C0  C0P C0L
 1           Pro1   20.0  0   0   0   0   0   0
 GenerateInclusions
 Selection_Type Random
 TypeID     1     2     3
 Density    0.1   0     0

All simulations were performed on a single node of the Tycho supercomputer (SCIENCE HPC Center, University of Copenhagen) equipped with an AMD Epyc Genoa 9554 processor. Each data point represents the average of 10 independent runs, with the standard error of the mean (SEM) used to quantify uncertainty. FreeDTSv2.1 was compiled with OpenMP support (./compile_OpenMP.sh).

Strong Scaling

Strong scaling was evaluated by fixing the system size to $N_{\nu}=18,000$ vertices and increasing the number of threads for 100 MC steps.

Strong Scaling

The plot shows the speedup relative to the single-threaded case ($T_1/T_N$) as a function of thread count $N$. Blue points represent measured data, the black line represents ideal scaling, and the orange dashed line is a fit to Amdahl's law [1]. From the fit, the parallel fraction of the code under strong scaling conditions is estimated at 94%.

Weak scaling

Weak scaling was evaluated by keeping the workload per thread fixed ($N_{\nu}=1968$ vertices per thread) while proportionally increasing the total system size with thread count.

Weak Scaling

Here, speedup is reported as $T_1 N/T_N$ versus $N$. As before, blue points show measured data, the black line represents ideal scaling, and the orange dashed line is a fit to Gustafson-Barsis’s law [1]. The parallel fraction under weak scaling conditions is approximately 64%.

Conclusion

Multi-threading in FreeDTSv2.2 significantly improves simulation efficiency, enabling the study of large membranes with minimal performance loss. However, there remains potential for further optimization, particularly in weak scaling scenarios.

[1] Parallel and High Performance Computing, Robert Robey and Yuliana Zamora.