Performance tests - ProkopHapala/FireCore GitHub Wiki
Cost o functions on GPU
__kernel scanNonBond2PBC
scanNonBond2PBC() invR2 | 0.0314 [ns/op] 31.8045 [GOPS] | ntot: 68921000000 np: 1000 na: 1000 nPBC( 68921,[20, 20, 20]) time: 2.1670 [s]
scanNonBond2PBC() R2gauss | 0.0203 [ns/op] 49.1580 [GOPS] | ntot: 68921000000 np: 1000 na: 1000 nPBC( 68921,[20, 20, 20]) time: 1.4020 [s]
scanNonBond2PBC() Morse_lin5 | 0.0371 [ns/op] 26.9293 [GOPS] | ntot: 68921000000 np: 1000 na: 1000 nPBC( 68921,[20, 20, 20]) time: 2.5593 [s]
scanNonBond2PBC() Morse_lin9 | 0.0396 [ns/op] 25.2603 [GOPS] | ntot: 68921000000 np: 1000 na: 1000 nPBC( 68921,[20, 20, 20]) time: 2.7284 [s]
scanNonBond2PBC() Morse_lin17 | 0.0485 [ns/op] 20.6346 [GOPS] | ntot: 68921000000 np: 1000 na: 1000 nPBC( 68921,[20, 20, 20]) time: 3.3401 [s]
scanNonBond2PBC() Morse_cub5 | 0.0407 [ns/op] 24.5827 [GOPS] | ntot: 68921000000 np: 1000 na: 1000 nPBC( 68921,[20, 20, 20]) time: 2.8036 [s]
scanNonBond2PBC() Morse | 0.1559 [ns/op] 6.4149 [GOPS] | ntot: 68921000000 np: 1000 na: 1000 nPBC( 68921,[20, 20, 20]) time: 10.7439 [s]
__kernel scanNonBond2PBC_2
scanNonBond2PBC() invR2 | 0.0451 [ns/op] 22.1882 [GOPS] | ntot: 68921000000 np: 1000 na: 1000 nPBC( 68921,[20, 20, 20]) time: 3.1062 [s]
scanNonBond2PBC() R2gauss | 0.0349 [ns/op] 28.6160 [GOPS] | ntot: 68921000000 np: 1000 na: 1000 nPBC( 68921,[20, 20, 20]) time: 2.4085 [s]
scanNonBond2PBC() Morse_lin5 | 0.0495 [ns/op] 20.1843 [GOPS] | ntot: 68921000000 np: 1000 na: 1000 nPBC( 68921,[20, 20, 20]) time: 3.4146 [s]
scanNonBond2PBC() Morse_lin9 | 0.0514 [ns/op] 19.4399 [GOPS] | ntot: 68921000000 np: 1000 na: 1000 nPBC( 68921,[20, 20, 20]) time: 3.5453 [s]
scanNonBond2PBC() Morse_lin17 | 0.0598 [ns/op] 16.7264 [GOPS] | ntot: 68921000000 np: 1000 na: 1000 nPBC( 68921,[20, 20, 20]) time: 4.1205 [s]
scanNonBond2PBC() Morse_cub5 | 0.0511 [ns/op] 19.5799 [GOPS] | ntot: 68921000000 np: 1000 na: 1000 nPBC( 68921,[20, 20, 20]) time: 3.5200 [s]
scanNonBond2PBC() Morse | 0.1590 [ns/op] 6.2884 [GOPS] | ntot: 68921000000 np: 1000 na: 1000 nPBC( 68921,[20, 20, 20]) time: 10.9600 [s]
__kernel scanNonBond2
scanNonBond2() invR2 | 0.0028 [ns/op] 358.2453 [GOPS] | ntot: 100000000000 np: 100000 na: 1000000 time: 0.2791 [s]
scanNonBond2() R2gauss | 0.0021 [ns/op] 474.9835 [GOPS] | ntot: 100000000000 np: 100000 na: 1000000 time: 0.2105 [s]
scanNonBond2() Morse_lin5 | 0.0032 [ns/op] 310.8248 [GOPS] | ntot: 100000000000 np: 100000 na: 1000000 time: 0.3217 [s]
scanNonBond2() Morse_lin9 | 0.0036 [ns/op] 277.6633 [GOPS] | ntot: 100000000000 np: 100000 na: 1000000 time: 0.3601 [s]
scanNonBond2() Morse_lin17 | 0.0037 [ns/op] 272.0255 [GOPS] | ntot: 100000000000 np: 100000 na: 1000000 time: 0.3676 [s]
scanNonBond2() Morse_cub5 | 0.0039 [ns/op] 253.7923 [GOPS] | ntot: 100000000000 np: 100000 na: 1000000 time: 0.3940 [s]
scanNonBond2() Morse | 0.0065 [ns/op] 152.7715 [GOPS] | ntot: 100000000000 np: 100000 na: 1000000 time: 0.6546 [s]
scanNonBond2() invR2 | 0.0020 [ns/op] 494.0477 [GOPS] | ntot: 1000000000000 np: 1000000 na: 1000000 time: 2.0241 [s]
scanNonBond2() R2gauss | 0.0017 [ns/op] 602.1384 [GOPS] | ntot: 1000000000000 np: 1000000 na: 1000000 time: 1.6607 [s]
scanNonBond2() Morse_lin5 | 0.0029 [ns/op] 350.3550 [GOPS] | ntot: 1000000000000 np: 1000000 na: 1000000 time: 2.8542 [s]
scanNonBond2() Morse_lin9 | 0.0032 [ns/op] 310.5692 [GOPS] | ntot: 1000000000000 np: 1000000 na: 1000000 time: 3.2199 [s]
scanNonBond2() Morse_lin17 | 0.0033 [ns/op] 304.6605 [GOPS] | ntot: 1000000000000 np: 1000000 na: 1000000 time: 3.2823 [s]
scanNonBond2() Morse_cub5 | 0.0034 [ns/op] 292.7462 [GOPS] | ntot: 1000000000000 np: 1000000 na: 1000000 time: 3.4159 [s]
scanNonBond2() Morse | 0.0059 [ns/op] 170.7652 [GOPS] | ntot: 1000000000000 np: 1000000 na: 1000000 time: 5.8560 [s]
MMFFsp3_loc.h
(commit)
CPU single-core CPU: 16 core, AMD Ryzen 7 5800X, 2200/4850 Mhz
Test 1: nHexadecan_dicarboxylic 50 atoms using MMFFsp3
command: ./MolGUIapp -x common_resources/nHexadecan_dicarboxylic -iParalel 0 -T 100 0.01 -verb 2 -perframe 2000
NOTES: run_no_omp() bPBC=0
bNonBondNeighs=0 22.612 us/iter = 44224 iter/s
bNonBondNeighs=1 17.418 us/iter = 57411 iter/s
no-NonBond 4.520 us/iter = 221238 iter/s
Test 1b: nHexadecan_dicarboxylic 50 atoms using UFF
command: ./$name -x common_resources/nHexadecan_dicarboxylic -uff -iParalel 0 -T 100 0.01 -verb 2 -perframe 2000
bNonBondNeighs=0 26.90 us/iter = 37174 iter/s
bNonBondNeighs=1 21.88 us/iter = 45703 iter/s
no-NonBond 7.60 us/iter = 131578 iter/s
Test 2: polymer-2_new PBC 45 atoms
command: ./MolGUIapp -x common_resources/polymer-2_new -g common_resources/NaCl_1x1_L2 -iParalel 0 -T 100 0.01 -verb 2 -perframe 500
NOTES: run_no_omp() bPBC=1 nPBC{1,1,0}; i.e. 3x3 = 9 images
bNonBondNeighs=0 132.52 us/iter = 7575 iter/s
bNonBondNeighs=1 101.50 us/iter = 9852 iter/s
no-NonBond(no GridFF) 5.25 us/iter = 190476 iter/s
no-NonBond(+GridFF/triliner) 6.73 us/iter = 148588 iter/s
no-NonBond(+GridFF/triliner)(no termostat) 5.26 us/iter = 190114 iter/s
no-NonBond(+GridFF/tricubic) 9.58 us/iter = 104384 iter/s
Test 2: polymer-2_new CPU GridFF::addForce() vs. GridFF::evalMorsePBC_sym()
MolGUIapp -x common_resources/polymer-2_new -g common_resources/NaCl_1x1_L2 -Ftol 1e-12 -iParalel 0 -dt 0.05 -nogridff -perframe 100
MolWorld_sp3::run_no_omp(bGridFF=false) 236.08 [us/iter] 4.235k [iter/s] 190k [atoms/s]
MolWorld_sp3::run_no_omp(bGridFF=true ) 1139.96 [us/iter] 877 [iter/s] 39k [atoms/s]
MolWorld_sp3::MDloop() (bUFF=0,iParalel=0,bSurfAtoms=1,bGridFF=1,bPBC=1,bNonBonded=1bNonBondNeighs=0,dt=0.05,niter=100) time=23.6358[ms/100](236.083[us/iter] tick2second=2.62962e-10)
MolWorld_sp3::MDloop() (bUFF=0,iParalel=0,bSurfAtoms=1,bGridFF=0,bPBC=1,bNonBonded=1bNonBondNeighs=0,dt=0.05,niter=100) time=113.861[ms/100](1139.96[us/iter] tick2second=2.62962e-10)
Test 2: polymer-2_new GPU GridFF() vs. getSurfMorse()
- NVIDIA GeForce RTX 3090 24GB driver 535.161.08
- NaCl substrate containing 10 atoms with 121 PBC images ( nPBC=(5,5,0) 1210 atoms total)
- polymer-2_new 45 atoms
- from 40 to 200 replicas in paralel
- 500 or 100 iterations per second
iParalle=3
i.e.MolWorld_sp3_multi::run_ocl_opt()
MolGUIapp_multi -m 40 -x common_resources/polymer-2_new -g common_resources/NaCl_1x1_L2 -Ftol 1e-12 -perframe 500
MolGUIapp_multi -m 200 -x common_resources/polymer-2_new -g common_resources/NaCl_1x1_L2 -Ftol 1e-12 -perframe 100
Results
run_ocl_opt(bGridFF=true,nSys=40 ,perFrame=500) 86.595 [us/step] 476k [step/s] 20.78 mil. [atom/s]
run_ocl_opt(bGridFF=false,nSys=40 ,perFrame=500) 194.658 [us/step] 205k [step/s] 9.24 mil. [atom/s]
run_ocl_opt(bGridFF=true,nSys=200,perFrame=100) 114.278 [us/step] 1750 [step/s] 78.75 mil. [atom/s]
run_ocl_opt(bGridFF=false,nSys=200,perFrame=100) 249.81 [us/step] 800 [step/s] 36.02 mil. [atom/s]
run_ocl_opt(nSys=40|iPara=3,bSurfAtoms=1,bGridFF=1) NOT CONVERGED in 50 steps, |F|(7.20113e-05)>1e-12 time 4.1973 [ms]( 83.946 [us/step]) bGridFF=1 iSysFMax=0 dovdW=1
run_ocl_opt(nSys=40|iPara=3,bSurfAtoms=1,bGridFF=0) NOT CONVERGED in 50 steps, |F|(7.23475e-05)>1e-12 time 9.76089 [ms]( 195.218 [us/step]) bGridFF=0 iSysFMax=0 dovdW=1
run_ocl_opt(nSys=40|iPara=3,bSurfAtoms=1,bGridFF=1) NOT CONVERGED in 500 steps, |F|(9.12548e-05)>1e-12 time 43.2977 [ms]( 86.5954 [us/step]) bGridFF=1 iSysFMax=0 dovdW=1
run_ocl_opt(nSys=40|iPara=3,bSurfAtoms=1,bGridFF=0) NOT CONVERGED in 500 steps, |F|(7.3448e-05)>1e-12 time 97.3289 [ms]( 194.658 [us/step]) bGridFF=0 iSysFMax=1 dovdW=1
run_ocl_opt(nSys=200|iPara=3,bSurfAtoms=1,bGridFF=1) NOT CONVERGED in 100 steps, |F|(7.39509e-05)>1e-12 time 11.4278 [ms]( 114.278 [us/step]) bGridFF=1 iSysFMax=0 dovdW=1
run_ocl_opt(nSys=200|iPara=3,bSurfAtoms=1,bGridFF=0) NOT CONVERGED in 100 steps, |F|(0.000113862)>1e-12 time 24.981 [ms]( 249.81 [us/step]) bGridFF=0 iSysFMax=0 dovdW=1