Lattice Optimization OCL OMP - ProkopHapala/FireCore GitHub Wiki
Computer in the office
- Intel(R) Core(TM) i5-10400F CPU @ 2.90GHz (11/12 cores used)
- NVIDIA GeForce RTX 3060 OpenCL1.2 COMPUTE_UNITS:28 @1777MHz GLOBAL_MEM:12019 MB LOCAL_MEM:48kB
Interactive GUI:
pulling atoms, natoms 45 nnode 24 ncap 21 npi 24 nPBC{1,1,0}
Build-dbg
run_ocl_opt (nSys=40|iPara=2) NOT CONVERGED in 50/50 steps |F|(0.000400031)>1e-06 time= 4.8674[ms] 97.34 [us/step] bGridFF=1 iSysFMax=0 dovdW=1
run_omp_ocl (nSys=40|iPara=1) NOT CONVERGED in 50/50 steps |F|=6.10705e-05 time= 10.9407[ms] 218.81 [us/step]
run_omp_ocl (nSys=40|iPara=0) NOT CONVERGED in 50/50 steps |F|=0.0121325 time= 82.8864[ms] 1657.73 [us/step]
run_multi_serial(nSys=40|iPara=-1) NOT CONVERGED in 50/50 steps |F|=0.165467 time= 436.057[ms] 8721.15 [us/step]
Build-opt
solver natom perFrame nSys [ms] [us/step] nstep/sec nstep*nSys/s nstep*natom/sec
--------------------------------------------------------------------------------------------------------------------
run_ocl_opt (nSys=40|iPara=2) 45 50 40 4.848 96.97 10,312.5 412,499 18,562,442
run_omp_ocl (nSys=40|iPara=1) 45 50 40 8.885 177.71 5,627.1 225,086 10,128,862
run_omp_ocl (nSys=40|iPara=0) 45 50 40 83.129 1,662.59 601.5 24,057 1,082,648
run_multi_serial(nSys=40|iPara=-1) 45 50 40 438.562 8,771.24 114.0 4,560 205,216
run_ocl_opt (nSys=40|iPara=2) NOT CONVERGED in 50/50 steps, |F|(1.65009)>1e-06 time 4.848[ms] 96.97 [us/step] bGridFF=1 iSysFMax=0 dovdW=1
run_omp_ocl (nSys=40|iPara=1) NOT CONVERGED in 50/50 steps |F|=0.424407 time= 8.885[ms] 177.71 [us/step]
run_omp_ocl (nSys=40|iPara=0) NOT CONVERGED in 50/50 nsteps |F|=0.241127 time= 83.129[ms] 1662.59 [us/step]
run_multi_serial(nSys=40|iPara=-1) NOT CONVERGED in 50/50 nsteps |F|=0.0544194 time= 438.562[ms] 8771.24 [us/step]
Lattice Optimization
Laptop at home
- Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz (7/8 cores used)
- NVIDIA GeForce GTX 960M OpenCL1.2 5Compute-units @ 1176MHz GLOBAL_MEM 4046MB LOCAL_MEM 48kB
Interactive GUI:
pulling atoms, natoms 45 nnode 24 ncap 21 npi 24 nPBC{1,1,0}
run_multi_serial(nSys=40|iPara=-1) NOT CONVERGED in 50/50 nsteps |F|=0.507869 time=367.992[ms] 7359.84[us/step]
run_omp_ocl(nSys=40|iPara=0) NOT CONVERGED in 50/50 nsteps |F|=0.109739 time=107.339[ms] 2146.78[us/step]
run_omp_ocl(nSys=40|iPara=1) NOT CONVERGED in 50/50 nsteps |F|=0.0744521 time=16.7628[ms] 335.256[us/step]
run_ocl_opt(nSys=40|iPara=2) NOT CONVERGED in 50 steps, |F|(0.0639756)>1e-06 time 14.3617[ms] 287.234[us/step] bGridFF=1 iSysFMax=0 dovdW=1
getBuffs(): nSys 40 nDOFs 207 nvecs 69 natoms 45 nnode 24 ncap 21 npi 24 nPBC{1,1,0}
mmff.run(10000,iParalel=-1)
run_multi_serial(nSys=40|iPara=2) CONVERGED in 1654/10000 nsteps |F|=0.000972228 time=31521.1[ms] 19057.5[us/step]
Py: time(optimizeLattice_1d) 34.6389[s]
mmff.run(10000,iParalel=0)
run_omp_ocl(nSys=40|iPara=2) CONVERGED in 1654/10000 nsteps |F|=0.000972228 time=10314.9[ms] 6236.34[us/step]
Py: time(optimizeLattice_1d) 10.426[s]
mmff.run(10000,iParalel=1)
run_omp_ocl(nSys=40|iPara=2) CONVERGED in 1797/10000 nsteps |F|=0.000918641 time=1263.58[ms] 703.163[us/step]
Py: time(optimizeLattice_1d) 1.1521[s]
mmff.run(10000,iParalel=2)
run_ocl_opt(nSys=40|iPara=2) CONVERGED in <3730 steps, |F|(0.000996397)<0.001 time 2714.56[ms] 727.763[us/step] bGridFF=1
Py: time(optimizeLattice_1d) 4.76416[s]
run_ocl_opt(nSys=40|iPara=2) NOT CONVERGED in 50 steps, |F|(0.000474281)>1e-06 time 14.3493[ms] 286.986[us/step] bGridFF=1 iSysFMax=0 dovdW=1
getBuffs(): nSys 10 nDOFs 207 nvecs 69 natoms 45 nnode 24 ncap 21 npi 24 nPBC{1,1,0}
mmff.run(10000,iParalel=-1)
rum_multi_serial(bOcl=0) CONVERGED in 1654/10000 nsteps |F|=0.000972228 time=7654.02[ms]
Py: time(optimizeLattice_1d) 8.53864[s]
mmff.run(10000,iParalel=0)
rum_omp_ocl(bOcl=0) CONVERGED in 1654/10000 nsteps |F|=0.000972228 time=3878.08[ms]
Py: time(optimizeLattice_1d) 3.6377[s]
mmff.run(10000,iParalel=1)
rum_omp_ocl(bOcl=1) CONVERGED in 1797/10000 nsteps |F|=0.000918641 time=897.201[ms]
Py: time(optimizeLattice_1d) 0.811637[s]
local_size
in kernel getNonBond()
:
Dependence on for laptob GPU NVIDIA GeForce GTX 960M
, 50 iterations of MolWorld_sp3_multi::run_ocl_opt()
with polymer-new.xyz (45 atoms):
nloc=1 119.535 [ms]
nloc=2 64.366 [ms]
nloc=4 38.498 [ms]
nloc=8 24.559 [ms]
nloc=16 15.665 [ms]
nloc=32 13.665 [ms]
nloc=64 13.800 [ms]