Performance tests - ProkopHapala/FireCore GitHub Wiki
MMFFsp3_loc.h
(commit)
CPU single-core CPU: 16 core, AMD Ryzen 7 5800X, 2200/4850 Mhz
Test 1: nHexadecan_dicarboxylic 50 atoms using MMFFsp3
command: ./MolGUIapp -x common_resources/nHexadecan_dicarboxylic -iParalel 0 -T 100 0.01 -verb 2 -perframe 2000
NOTES: run_no_omp() bPBC=0
bNonBondNeighs=0 22.612 us/iter = 44224 iter/s
bNonBondNeighs=1 17.418 us/iter = 57411 iter/s
no-NonBond 4.520 us/iter = 221238 iter/s
Test 1b: nHexadecan_dicarboxylic 50 atoms using UFF
command: ./$name -x common_resources/nHexadecan_dicarboxylic -uff -iParalel 0 -T 100 0.01 -verb 2 -perframe 2000
bNonBondNeighs=0 26.90 us/iter = 37174 iter/s
bNonBondNeighs=1 21.88 us/iter = 45703 iter/s
no-NonBond 7.60 us/iter = 131578 iter/s
Test 2: polymer-2_new PBC 45 atoms
command: ./MolGUIapp -x common_resources/polymer-2_new -g common_resources/NaCl_1x1_L2 -iParalel 0 -T 100 0.01 -verb 2 -perframe 500
NOTES: run_no_omp() bPBC=1 nPBC{1,1,0}; i.e. 3x3 = 9 images
bNonBondNeighs=0 132.52 us/iter = 7575 iter/s
bNonBondNeighs=1 101.50 us/iter = 9852 iter/s
no-NonBond(no GridFF) 5.25 us/iter = 190476 iter/s
no-NonBond(+GridFF/triliner) 6.73 us/iter = 148588 iter/s
no-NonBond(+GridFF/triliner)(no termostat) 5.26 us/iter = 190114 iter/s
no-NonBond(+GridFF/tricubic) 9.58 us/iter = 104384 iter/s
Test 2: polymer-2_new CPU GridFF::addForce() vs. GridFF::evalMorsePBC_sym()
MolGUIapp -x common_resources/polymer-2_new -g common_resources/NaCl_1x1_L2 -Ftol 1e-12 -iParalel 0 -dt 0.05 -nogridff -perframe 100
MolWorld_sp3::run_no_omp(bGridFF=false) 236.08 [us/iter] 4.235k [iter/s] 190k [atoms/s]
MolWorld_sp3::run_no_omp(bGridFF=true ) 1139.96 [us/iter] 877 [iter/s] 39k [atoms/s]
MolWorld_sp3::MDloop() (bUFF=0,iParalel=0,bSurfAtoms=1,bGridFF=1,bPBC=1,bNonBonded=1bNonBondNeighs=0,dt=0.05,niter=100) time=23.6358[ms/100](236.083[us/iter] tick2second=2.62962e-10)
MolWorld_sp3::MDloop() (bUFF=0,iParalel=0,bSurfAtoms=1,bGridFF=0,bPBC=1,bNonBonded=1bNonBondNeighs=0,dt=0.05,niter=100) time=113.861[ms/100](1139.96[us/iter] tick2second=2.62962e-10)
Test 2: polymer-2_new GPU GridFF() vs. getSurfMorse()
- NVIDIA GeForce RTX 3090 24GB driver 535.161.08
- NaCl substrate containing 10 atoms with 121 PBC images ( nPBC=(5,5,0) 1210 atoms total)
- polymer-2_new 45 atoms
- from 40 to 200 replicas in paralel
- 500 or 100 iterations per second
iParalle=3
i.e.MolWorld_sp3_multi::run_ocl_opt()
MolGUIapp_multi -m 40 -x common_resources/polymer-2_new -g common_resources/NaCl_1x1_L2 -Ftol 1e-12 -perframe 500
MolGUIapp_multi -m 200 -x common_resources/polymer-2_new -g common_resources/NaCl_1x1_L2 -Ftol 1e-12 -perframe 100
Results
run_ocl_opt(bGridFF=true,nSys=40 ,perFrame=500) 86.595 [us/step] 476k [step/s] 20.78 mil. [atom/s]
run_ocl_opt(bGridFF=false,nSys=40 ,perFrame=500) 194.658 [us/step] 205k [step/s] 9.24 mil. [atom/s]
run_ocl_opt(bGridFF=true,nSys=200,perFrame=100) 114.278 [us/step] 1750 [step/s] 78.75 mil. [atom/s]
run_ocl_opt(bGridFF=false,nSys=200,perFrame=100) 249.81 [us/step] 800 [step/s] 36.02 mil. [atom/s]
run_ocl_opt(nSys=40|iPara=3,bSurfAtoms=1,bGridFF=1) NOT CONVERGED in 50 steps, |F|(7.20113e-05)>1e-12 time 4.1973 [ms]( 83.946 [us/step]) bGridFF=1 iSysFMax=0 dovdW=1
run_ocl_opt(nSys=40|iPara=3,bSurfAtoms=1,bGridFF=0) NOT CONVERGED in 50 steps, |F|(7.23475e-05)>1e-12 time 9.76089 [ms]( 195.218 [us/step]) bGridFF=0 iSysFMax=0 dovdW=1
run_ocl_opt(nSys=40|iPara=3,bSurfAtoms=1,bGridFF=1) NOT CONVERGED in 500 steps, |F|(9.12548e-05)>1e-12 time 43.2977 [ms]( 86.5954 [us/step]) bGridFF=1 iSysFMax=0 dovdW=1
run_ocl_opt(nSys=40|iPara=3,bSurfAtoms=1,bGridFF=0) NOT CONVERGED in 500 steps, |F|(7.3448e-05)>1e-12 time 97.3289 [ms]( 194.658 [us/step]) bGridFF=0 iSysFMax=1 dovdW=1
run_ocl_opt(nSys=200|iPara=3,bSurfAtoms=1,bGridFF=1) NOT CONVERGED in 100 steps, |F|(7.39509e-05)>1e-12 time 11.4278 [ms]( 114.278 [us/step]) bGridFF=1 iSysFMax=0 dovdW=1
run_ocl_opt(nSys=200|iPara=3,bSurfAtoms=1,bGridFF=0) NOT CONVERGED in 100 steps, |F|(0.000113862)>1e-12 time 24.981 [ms]( 249.81 [us/step]) bGridFF=0 iSysFMax=0 dovdW=1