Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iterative solver on OpenCL (GPU) devices #199

Open
GoogleCodeExporter opened this issue Aug 12, 2015 · 3 comments
Open

Iterative solver on OpenCL (GPU) devices #199

GoogleCodeExporter opened this issue Aug 12, 2015 · 3 comments
Assignees
Labels
comp-Logic Related to internal code logic OpenCL Running on GPUs and similar devices performance Simulation speed, memory consumption pri-Medium Worth assigning to a milestone
Milestone

Comments

@GoogleCodeExporter
Copy link

On recent GPU devices the matrix vector multiplication in adda is as fast as 
the preparation of the next argument vector within the iterative solver 
(currently done by the CPU). Therefore the iterative solver should also run on 
GPU to avoid transferring vectors from host to device each iteration and to 
speed-up the computation. Since most of the functions executed by the iterative 
solvers in adda are level1 (vector) basic linear algebra functions, potentially 
the clAmdBlas library can be employed to improve the execution speed also. 
This would mainly improve computation speed on larger grids and high dipole 
counts.



Original issue reported on code.google.com by Marcus.H...@gmail.com on 31 May 2014 at 3:36

@GoogleCodeExporter GoogleCodeExporter added OpSys-All comp-Logic Related to internal code logic performance Simulation speed, memory consumption pri-Medium Worth assigning to a milestone OpenCL Running on GPUs and similar devices labels Aug 12, 2015
@GoogleCodeExporter
Copy link
Author

The BiCG solver in OpenCL is introduced in r1349. It uses the clAmdBlas library 
for all vector related calculations. Using the USE_CLBLAS compiler option and 
the BiCG solver reduces the gap between the matrix vector multiplications to a 
small fraction of before. 
It was just tested on an AMD GPU yet.

Other solvers seem more complicated in their direct translation to OpenCL and 
they will probably perform slower. However, to give more flexibility (in case 
of poor convergence of BiCG), the translation of some more solvers seem 
desirable.

r1349 - 3466ed7

Original comment by Marcus.H...@gmail.com on 31 May 2014 at 4:49

@GoogleCodeExporter
Copy link
Author

Indeed, that is a nice proof-of-principle that can be used to estimate 
potential acceleration. However, I think that a more convenient (and scalable) 
approach is to leave iterative.c almost intact, but instead concentrate on 
linalg.c. 

So all functions in the latter should be rewritten (ifdef OCL_BLAS) through 
calls to clBLAS. Actually it may be possible to use the same symbols (xvec, 
pvec, etc.) and function calls in iterative.c. The only difference is that they 
will be defined either as standard C vectors or as OpenCL vectors depending on 
the compilation mode. The actual awareness of the type of this vectors will 
only be required at the start and end of the iterative solvers (to put the 
vectors in or out of the GPU).

Original comment by yurkin on 3 Aug 2014 at 5:55

  • Added labels: Component-Logic, OpenCL, Performance

@myurkin myurkin added feature Allows new functionality and removed Type-Enhancement labels Aug 13, 2015
@myurkin myurkin added this to the 1.5 milestone Jul 10, 2018
@myurkin
Copy link
Member

myurkin commented Nov 30, 2020

This already uses clBLAS for some time - #204

@myurkin myurkin modified the milestones: 1.5, 1.6 Apr 24, 2021
@myurkin myurkin removed the feature Allows new functionality label Apr 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp-Logic Related to internal code logic OpenCL Running on GPUs and similar devices performance Simulation speed, memory consumption pri-Medium Worth assigning to a milestone
Projects
None yet
Development

No branches or pull requests

3 participants