Installing Gpufit with cuBLAS and build Python wheel on Linux

To prepare, if your CUDA runtime will not be the one on the host, install it as you need it (and not just the driver runtime) to compile CUDA code, in this case 11.1:

# https://developer.nvidia.com/cuda-11.1.1-download-archive?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=2004&target_type=runfilelocal
wget https://developer.download.nvidia.com/compute/cuda/11.1.1/local_installers/cuda_11.1.1_455.32.00_linux.run
sudo sh cuda_11.1.1_455.32.00_linux.run --silent --toolkit --toolkitpath=/usr/local/cuda-11.1
sudo rm /usr/local/cuda
sudo ln -s /usr/local/cuda-11.2 /usr/local/cuda # Relink default CUDA after the installer linked 11.1

Then proceed to install

sudo apt install libboost-all-dev # required to build tests
conda create -n gpufit
conda activate gpufit
conda install -y python numpy scipy # scipy not necessary
conda install -y pytorch cudatoolkit=11.1 cmake -c pytorch -c conda-forge # pytorch not necessary
git clone https://github.com/gpufit/GPUfit gpufit

For some reason the default build ended up with:

-- CUDA_ARCHITECTURES=3.0;3.5;5.0;5.2;3.2;3.7;5.3;6.0;6.1;6.2;7.0+PTX
-- CUDA_NVCC_FLAGS=-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;
-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_32,code=sm_32;
-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_53,code=sm_53;-gencode;arch=compute_60,code=sm_60;
-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_62,code=sm_62;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_70,code=compute_70

...the first of which, compute_30, led to an error nvcc fatal : Unsupported gpu architecture 'compute_30' because Kepler architectures are deprecated from CUDA 11.

See devnotes

To fix this I edited the control flow in gpufit/Gpufit/CMakeLists.txt setting CUDA_ARCHITECTURES, but skipped it when building by specifying -DCUDA_ARCHITECTURES="8.0 8.6+PTX" flag to CMake (i.e. you don't really need to change the code, just specify the flag).

See description and edit here
I also changed time.clock() twice in gpufit.py line 175 to make it time.process_time(), as Python deprecated time.clock() in 3.8

Additionally, the source code states:

Multiple CUDA versions installed, specify which version to use Set CUDA_BIN_PATH before running CMake or CUDA_TOOLKIT_ROOT_DIR after first configuration to installation folder of desired CUDA version

Note: you may also need conda install nvcc_linux-64=11.1 -c conda-forge to satisfy CMake, but it may cause its own issues
This issue documents how to specify CUDA_INCLUDE_DIRS and other CMake variables (albeit on Windows) to suit a cudatoolkit conda installation (i.e. the pre-packaged form of CUDA, not the NVIDIA-provided one).

Following the guidance in issue 74, I also changed the gpufit/Gpufit/CMakeLists.txt else block at line 195 to:

        else()
            find_cuda_helper_libs(cublas_static)
            find_cuda_helper_libs(cublasLt_static)
            find_cuda_helper_libs(culibos)
            set( CUDA_CUBLAS_LIBRARIES
                ${CUDA_cublas_static_LIBRARY}
                ${CUDA_cublasLt_static_LIBRARY}
                ${CUDA_cudart_static_LIBRARY}
                ${CUDA_culibos_LIBRARY}
                dl
                pthread
                rt
            )
            message( STATUS "CUDA_CUBLAS_LIBRARIES=${CUDA_CUBLAS_LIBRARIES}" )
        endif()

...as only the first 2 listed there were picked up with just the CUDA_TOOLKIT_ROOT_DIR and subsequently I got various "undefined reference to..." errors from the linker.

I also noticed that not all of the models in the library are exposed via the Python API, so I added them to gpufit/Gpufit/python/pygpufit/gpufit.py:

class ModelID():

    GAUSS_1D = 0
    GAUSS_2D = 1
    GAUSS_2D_ELLIPTIC = 2
    GAUSS_2D_ROTATED = 3
    CAUCHY_2D_ELLIPTIC = 4
    LINEAR_1D = 5
    CAUCHY_2D_ELLIPTIC = 4
    LINEAR_1D = 5
    FLETCHER_POWELL_HELIX = 6
    BROWN_DENNIS = 7
    SPLINE_1D = 8
    SPLINE_2D = 9
    SPLINE_3D = 10
    SPLINE_3D_MULTICHANNEL = 11
    SPLINE_3D_PHASE_MULTICHANNEL = 12

After these changes, I then built as follows:

mkdir gpufit-build
cd gpufit-build
CUDA_BIN_PATH=/usr/local/cuda-11.1/
cmake -DCMAKE_BUILD_TYPE=RELEASE -DUSE_CUBLAS=ON -DCUDA_ARCHITECTURES="8.0 8.6+PTX" -DCUDA_TOOLKIT_ROOT_DIR="$CUDA_BIN_PATH" ../gpufit
make

This built the Python wheel at ./pyGpufit/dist/pyGpufit-1.1.0-py2.py3-none-any.whl under the gpufit-build directory

pip install --no-index --find-links "file://$PWD/pyGpufit/dist/pyGpufit-1.1.0-py2.py3-none-any.whl" pyGpufit

If you need to edit the source code, then afterwards rebuild, use:

pip uninstall pyGpufit
cd gpufit-build
rm -rf ./*
cmake -DCMAKE_BUILD_TYPE=RELEASE -DUSE_CUBLAS=ON -DCUDA_ARCHITECTURES="8.0 8.6+PTX" ../gpufit
make
pip install --no-index --find-links "file://$PWD/pyGpufit/dist/pyGpufit-1.1.0-py2.py3-none-any.whl" pyGpufit

To run a single comparison of CPU to GPU run ./Gpufit_Cpufit_Nvidia_Profiler_Test

Click to show Gpufit_Cpufit_Nvidia_Profiler_Test result

generate test parameters
generating 2000000 fits ...
--------------------------------------------------
||||||||||||||||||||||||||||||||||||||||||||||||||
--------------------------------------------------
add noise

100000 fits on the CPU

  ***Cpufit***       3.595 s      27816.41 fits/s
x precision: 0.027346 px  mean iterations: 3.50

2000000 fits on the GPU

  ***Gpufit***       1.211 s    1651527.66 fits/s
x precision: 0.027328 px  mean iterations: 3.49

PERFORMANCE GAIN Gpufit/Cpufit       59.37

To run many comparisons between CPU and GPU run ./Gpufit_Cpufit_Performance_Comparison
- This will also report whether cuBLAS is set up

Click to show Gpufit_Cpufit_Performance_Comparison result

----------------------------------------
Performance comparison Gpufit vs. Cpufit
----------------------------------------

Please note that execution speed test results depend on
the details of the CPU and GPU hardware.

CUDA runtime version: 11.1
CUDA driver version:  11.2

CUBLAS enabled: Yes

                              -------------------------
Generating test parameters    |||||||||||||||||||||||||
                              -------------------------
                              -------------------------
Generating data               |||||||||||||||||||||||||
                              -------------------------
                              -------------------------
Adding noise                  |||||||||||||||||||||||||
                              -------------------------

  Number  | Cpufit speed  | Gpufit speed  | Performance
 of fits  |     (fits/s)  |     (fits/s)  | gain factor
-------------------------------------------------------
      10  |          inf  |           27  |        0.00
     100  |          inf  |       100000  |        0.00
    1000  |       200000  |      1000000  |        5.00
   10000  |       200000  |      5000000  |       25.00
  100000  |       206612  |      8333333  |       40.33
 1000000  |       207383  |     10752688  |       51.85
10000000  |       206522  |     13908206  |       67.34

Test completed!

One of the first things you might want to do with your new speedy Python interface is try out the examples:

cd ../gpufit/Gpufit/python/examples
python simple.py
python gauss2d.py
python gauss2d_plot.py

Installing Gpufit with cuBLAS and build Python wheel on Linux - lmmx/devnotes GitHub Wiki

⚠️ GitHub.com Fallback ⚠️

Installing Gpufit with cuBLAS and build Python wheel on Linux - lmmx/devnotes GitHub Wiki

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️