Performance tests - ProkopHapala/FireCore GitHub Wiki

CPU single-core MMFFsp3_loc.h (commit)

CPU: 16 core, AMD Ryzen 7 5800X, 2200/4850 Mhz

Test 1: nHexadecan_dicarboxylic 50 atoms using MMFFsp3

command: ./MolGUIapp -x common_resources/nHexadecan_dicarboxylic -iParalel 0 -T 100 0.01 -verb 2 -perframe 2000

NOTES: run_no_omp() bPBC=0

bNonBondNeighs=0   22.612 us/iter    =  44224 iter/s
bNonBondNeighs=1   17.418 us/iter    =  57411 iter/s
no-NonBond          4.520 us/iter    = 221238 iter/s 

Test 1b: nHexadecan_dicarboxylic 50 atoms using UFF

command: ./$name -x common_resources/nHexadecan_dicarboxylic -uff -iParalel 0 -T 100 0.01 -verb 2 -perframe 2000

bNonBondNeighs=0   26.90 us/iter    =  37174 iter/s
bNonBondNeighs=1   21.88 us/iter    =  45703 iter/s
no-NonBond          7.60 us/iter    = 131578 iter/s 

Test 2: polymer-2_new PBC 45 atoms

command: ./MolGUIapp -x common_resources/polymer-2_new -g common_resources/NaCl_1x1_L2 -iParalel 0 -T 100 0.01 -verb 2 -perframe 500

NOTES: run_no_omp() bPBC=1 nPBC{1,1,0}; i.e. 3x3 = 9 images

bNonBondNeighs=0                          132.52 us/iter  =   7575 iter/s
bNonBondNeighs=1                          101.50 us/iter  =   9852 iter/s
no-NonBond(no GridFF)                       5.25 us/iter  = 190476 iter/s 
no-NonBond(+GridFF/triliner)                6.73 us/iter  = 148588 iter/s 
no-NonBond(+GridFF/triliner)(no termostat)  5.26 us/iter  = 190114 iter/s 
no-NonBond(+GridFF/tricubic)                9.58 us/iter  = 104384 iter/s 

Test 2: polymer-2_new CPU GridFF::addForce() vs. GridFF::evalMorsePBC_sym()

MolGUIapp -x common_resources/polymer-2_new   -g common_resources/NaCl_1x1_L2   -Ftol 1e-12 -iParalel 0  -dt 0.05 -nogridff -perframe 100
MolWorld_sp3::run_no_omp(bGridFF=false)    236.08 [us/iter]  4.235k [iter/s]    190k [atoms/s]
MolWorld_sp3::run_no_omp(bGridFF=true )   1139.96 [us/iter]  877    [iter/s]     39k [atoms/s]
MolWorld_sp3::MDloop()  (bUFF=0,iParalel=0,bSurfAtoms=1,bGridFF=1,bPBC=1,bNonBonded=1bNonBondNeighs=0,dt=0.05,niter=100) time=23.6358[ms/100](236.083[us/iter] tick2second=2.62962e-10)
MolWorld_sp3::MDloop()  (bUFF=0,iParalel=0,bSurfAtoms=1,bGridFF=0,bPBC=1,bNonBonded=1bNonBondNeighs=0,dt=0.05,niter=100) time=113.861[ms/100](1139.96[us/iter] tick2second=2.62962e-10)

Test 2: polymer-2_new GPU GridFF() vs. getSurfMorse()

  • NVIDIA GeForce RTX 3090 24GB driver 535.161.08
  • NaCl substrate containing 10 atoms with 121 PBC images ( nPBC=(5,5,0) 1210 atoms total)
  • polymer-2_new 45 atoms
  • from 40 to 200 replicas in paralel
  • 500 or 100 iterations per second
  • iParalle=3 i.e. MolWorld_sp3_multi::run_ocl_opt() MolGUIapp_multi -m 40 -x common_resources/polymer-2_new -g common_resources/NaCl_1x1_L2 -Ftol 1e-12 -perframe 500 MolGUIapp_multi -m 200 -x common_resources/polymer-2_new -g common_resources/NaCl_1x1_L2 -Ftol 1e-12 -perframe 100

Results

run_ocl_opt(bGridFF=true,nSys=40 ,perFrame=500)    86.595 [us/step]   476k [step/s]  20.78 mil. [atom/s]
run_ocl_opt(bGridFF=false,nSys=40 ,perFrame=500)  194.658 [us/step]   205k [step/s]   9.24 mil. [atom/s]
run_ocl_opt(bGridFF=true,nSys=200,perFrame=100)   114.278 [us/step]  1750  [step/s]  78.75 mil. [atom/s]
run_ocl_opt(bGridFF=false,nSys=200,perFrame=100)  249.81  [us/step]   800  [step/s]  36.02 mil. [atom/s]
run_ocl_opt(nSys=40|iPara=3,bSurfAtoms=1,bGridFF=1) NOT CONVERGED in 50 steps, |F|(7.20113e-05)>1e-12 time 4.1973 [ms]( 83.946 [us/step]) bGridFF=1 iSysFMax=0 dovdW=1 
run_ocl_opt(nSys=40|iPara=3,bSurfAtoms=1,bGridFF=0) NOT CONVERGED in 50 steps, |F|(7.23475e-05)>1e-12 time 9.76089 [ms]( 195.218 [us/step]) bGridFF=0 iSysFMax=0 dovdW=1 
run_ocl_opt(nSys=40|iPara=3,bSurfAtoms=1,bGridFF=1) NOT CONVERGED in 500 steps, |F|(9.12548e-05)>1e-12 time 43.2977 [ms]( 86.5954 [us/step]) bGridFF=1 iSysFMax=0 dovdW=1 
run_ocl_opt(nSys=40|iPara=3,bSurfAtoms=1,bGridFF=0) NOT CONVERGED in 500 steps, |F|(7.3448e-05)>1e-12 time 97.3289 [ms]( 194.658 [us/step]) bGridFF=0 iSysFMax=1 dovdW=1 
run_ocl_opt(nSys=200|iPara=3,bSurfAtoms=1,bGridFF=1) NOT CONVERGED in 100 steps, |F|(7.39509e-05)>1e-12 time 11.4278 [ms]( 114.278 [us/step]) bGridFF=1 iSysFMax=0 dovdW=1
run_ocl_opt(nSys=200|iPara=3,bSurfAtoms=1,bGridFF=0) NOT CONVERGED in 100 steps, |F|(0.000113862)>1e-12 time 24.981 [ms]( 249.81 [us/step]) bGridFF=0 iSysFMax=0 dovdW=1