Compilation with CUDA support - GeoEnergyLab-EPFL/BigWham GitHub Wiki

As the matrix-vector operation on the hierarchical matrices is memory-bandwidth bounded, an implementation of this operation based on CUDA has been developed to benefit from the high memory bandwidth of GPUs.

To compile BigWham with CUDA support you'll need :

  • a BLAS vendor (mainly tested using OpenBlas)
  • a C/C++ compiler
  • CUDA drivers (mainly tested using cuda 12.6)
  • python > 3.10 (mainly tested/developed using python 3.11) if interested in the python binder

1. Setting up the python environment

python -m venv bigwham_env
source bigwham_env/bin/activate
pip install numpy scipy matplotlib

2. Compiling BigWham on the cuda_dev branch

git checkout cuda_dev
mkdir build
cd build
# Here to build BigWham with Cuda support and OpenBlas as the BLAS vendor :
cmake -DUSE_CUDA=ON -DCMAKE_BUILD_TYPE=Release ..
make -j4

3. Installing the bigwham4py python module

cd interfaces/python/
pip install .

Using BigWham with CUDA support

Here is a small python script solving a linear system on the BigWham hierarchical matrix with Cuda support enabled, it reads the input mesh from a meshio compatible format (ex : vtk). The relevant options of the BEMatrix constructor to enable CUDA are useCuda, homogeneous_size_pattern and fixed_rank, the two first must be set to True while the fixed_rank must be a positive integer. The n_GPUs is optionnal.

import sys
import numpy as np

import matplotlib.pyplot as plt
import matplotlib

from scipy.sparse.linalg import gmres

from bigwham4py import BEMatrix

import meshio

#reading it 
print(f"Reading mesh from {sys.argv[1]}")
mesh = meshio.read(sys.argv[1])
coor = mesh.points.astype(np.float64)
conn = mesh.cells_dict['triangle'].astype(np.int64)
        
dim = coor.shape[1]
num_dof = conn.shape[0]*dim
        
# Building the hmat
G=1.
nu=0.15
elas_prop = np.array([2*G*(1+nu), nu])
eta=5.0
max_leaf_size = 32
fixed_rank = int(max_leaf_size / 4)
eps_aca = 1.0e-4
num_threads = 64
basekernel = "3DT0-H"

# BigWham hmat with Cuda support for matvec operations
hmat = BEMatrix(
    basekernel, coor, conn, elas_prop, max_leaf_size, eta, eps_aca, 
    homogeneous_size_pattern=True, 
    fixed_rank=fixed_rank, 
    useCuda=True, 
    n_GPUs=1,
    n_openMP_threads=num_threads, verbose=True
    )
    
# Traction vector
tinf = np.ones(hmat.shape[0])
    
# Solve using scipy's GMRES
d,info = gmres(hmat, tinf, M=hmat.H_jacobi_prec(), rtol=1e-6)
        
# Plot solution
dd = d.reshape((-1, 3))
fig1, ax1 = plt.subplots()
triang=matplotlib.tri.Triangulation(coor[:,0], coor[:,1], triangles=conn, mask=None)
tri=ax1.tripcolor(triang, dd[:,2],cmap = plt.cm.rainbow, alpha = 0.5)
ax1.axis('equal')
plt.colorbar(tri) 
plot_file = "bigwham_gmres_solve_GPU.png"
plt.savefig(plot_file)
print(f"Solution plot saved in " + plot_file)