Installing Gpufit with cuBLAS and build Python wheel on Linux - lmmx/devnotes GitHub Wiki
To prepare, if your CUDA runtime will not be the one on the host, install it as you need it (and not just the driver runtime) to compile CUDA code, in this case 11.1:
# https://developer.nvidia.com/cuda-11.1.1-download-archive?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=2004&target_type=runfilelocal
wget https://developer.download.nvidia.com/compute/cuda/11.1.1/local_installers/cuda_11.1.1_455.32.00_linux.run
sudo sh cuda_11.1.1_455.32.00_linux.run --silent --toolkit --toolkitpath=/usr/local/cuda-11.1
sudo rm /usr/local/cuda
sudo ln -s /usr/local/cuda-11.2 /usr/local/cuda # Relink default CUDA after the installer linked 11.1
Then proceed to install
sudo apt install libboost-all-dev # required to build tests
conda create -n gpufit
conda activate gpufit
conda install -y python numpy scipy # scipy not necessary
conda install -y pytorch cudatoolkit=11.1 cmake -c pytorch -c conda-forge # pytorch not necessary
git clone https://github.com/gpufit/GPUfit gpufit
For some reason the default build ended up with:
-- CUDA_ARCHITECTURES=3.0;3.5;5.0;5.2;3.2;3.7;5.3;6.0;6.1;6.2;7.0+PTX
-- CUDA_NVCC_FLAGS=-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;
-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_32,code=sm_32;
-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_53,code=sm_53;-gencode;arch=compute_60,code=sm_60;
-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_62,code=sm_62;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_70,code=compute_70
...the first of which, compute_30
, led to an error nvcc fatal : Unsupported gpu architecture 'compute_30'
because
Kepler architectures are deprecated from CUDA 11.
- See devnotes
To fix this I edited the control flow in gpufit/Gpufit/CMakeLists.txt
setting CUDA_ARCHITECTURES
,
but skipped it when building by specifying -DCUDA_ARCHITECTURES="8.0 8.6+PTX"
flag to CMake
(i.e. you don't really need to change the code, just specify the flag).
- See description and edit here
- I also changed
time.clock()
twice ingpufit.py
line 175 to make ittime.process_time()
, as Python deprecatedtime.clock()
in 3.8
Additionally, the source code states:
Multiple CUDA versions installed, specify which version to use Set
CUDA_BIN_PATH
before running CMake orCUDA_TOOLKIT_ROOT_DIR
after first configuration to installation folder of desired CUDA version
- Note: you may also need
conda install nvcc_linux-64=11.1 -c conda-forge
to satisfy CMake, but it may cause its own issues -
This issue documents how to specify
CUDA_INCLUDE_DIRS
and other CMake variables (albeit on Windows) to suit acudatoolkit
conda installation (i.e. the pre-packaged form of CUDA, not the NVIDIA-provided one).
Following the guidance in issue 74,
I also changed the gpufit/Gpufit/CMakeLists.txt
else block at line 195 to:
else()
find_cuda_helper_libs(cublas_static)
find_cuda_helper_libs(cublasLt_static)
find_cuda_helper_libs(culibos)
set( CUDA_CUBLAS_LIBRARIES
${CUDA_cublas_static_LIBRARY}
${CUDA_cublasLt_static_LIBRARY}
${CUDA_cudart_static_LIBRARY}
${CUDA_culibos_LIBRARY}
dl
pthread
rt
)
message( STATUS "CUDA_CUBLAS_LIBRARIES=${CUDA_CUBLAS_LIBRARIES}" )
endif()
...as only the first 2 listed there were picked up with just the CUDA_TOOLKIT_ROOT_DIR
and subsequently I got various "undefined reference to..." errors from the linker.
I also noticed that not all of the models in the library are exposed via the Python API, so I added
them to gpufit/Gpufit/python/pygpufit/gpufit.py
:
class ModelID():
GAUSS_1D = 0
GAUSS_2D = 1
GAUSS_2D_ELLIPTIC = 2
GAUSS_2D_ROTATED = 3
CAUCHY_2D_ELLIPTIC = 4
LINEAR_1D = 5
CAUCHY_2D_ELLIPTIC = 4
LINEAR_1D = 5
FLETCHER_POWELL_HELIX = 6
BROWN_DENNIS = 7
SPLINE_1D = 8
SPLINE_2D = 9
SPLINE_3D = 10
SPLINE_3D_MULTICHANNEL = 11
SPLINE_3D_PHASE_MULTICHANNEL = 12
After these changes, I then built as follows:
mkdir gpufit-build
cd gpufit-build
CUDA_BIN_PATH=/usr/local/cuda-11.1/
cmake -DCMAKE_BUILD_TYPE=RELEASE -DUSE_CUBLAS=ON -DCUDA_ARCHITECTURES="8.0 8.6+PTX" -DCUDA_TOOLKIT_ROOT_DIR="$CUDA_BIN_PATH" ../gpufit
make
This built the Python wheel at ./pyGpufit/dist/pyGpufit-1.1.0-py2.py3-none-any.whl
under
the gpufit-build
directory
pip install --no-index --find-links "file://$PWD/pyGpufit/dist/pyGpufit-1.1.0-py2.py3-none-any.whl" pyGpufit
- If you need to edit the source code, then afterwards rebuild, use:
pip uninstall pyGpufit
cd gpufit-build
rm -rf ./*
cmake -DCMAKE_BUILD_TYPE=RELEASE -DUSE_CUBLAS=ON -DCUDA_ARCHITECTURES="8.0 8.6+PTX" ../gpufit
make
pip install --no-index --find-links "file://$PWD/pyGpufit/dist/pyGpufit-1.1.0-py2.py3-none-any.whl" pyGpufit
- To run a single comparison of CPU to GPU run
./Gpufit_Cpufit_Nvidia_Profiler_Test
Click to show Gpufit_Cpufit_Nvidia_Profiler_Test result
generate test parameters
generating 2000000 fits ...
--------------------------------------------------
||||||||||||||||||||||||||||||||||||||||||||||||||
--------------------------------------------------
add noise
100000 fits on the CPU
***Cpufit*** 3.595 s 27816.41 fits/s
x precision: 0.027346 px mean iterations: 3.50
2000000 fits on the GPU
***Gpufit*** 1.211 s 1651527.66 fits/s
x precision: 0.027328 px mean iterations: 3.49
PERFORMANCE GAIN Gpufit/Cpufit 59.37
- To run many comparisons between CPU and GPU run
./Gpufit_Cpufit_Performance_Comparison
- This will also report whether cuBLAS is set up
Click to show Gpufit_Cpufit_Performance_Comparison result
----------------------------------------
Performance comparison Gpufit vs. Cpufit
----------------------------------------
Please note that execution speed test results depend on
the details of the CPU and GPU hardware.
CUDA runtime version: 11.1
CUDA driver version: 11.2
CUBLAS enabled: Yes
-------------------------
Generating test parameters |||||||||||||||||||||||||
-------------------------
-------------------------
Generating data |||||||||||||||||||||||||
-------------------------
-------------------------
Adding noise |||||||||||||||||||||||||
-------------------------
Number | Cpufit speed | Gpufit speed | Performance
of fits | (fits/s) | (fits/s) | gain factor
-------------------------------------------------------
10 | inf | 27 | 0.00
100 | inf | 100000 | 0.00
1000 | 200000 | 1000000 | 5.00
10000 | 200000 | 5000000 | 25.00
100000 | 206612 | 8333333 | 40.33
1000000 | 207383 | 10752688 | 51.85
10000000 | 206522 | 13908206 | 67.34
Test completed!
One of the first things you might want to do with your new speedy Python interface is try out the examples:
cd ../gpufit/Gpufit/python/examples
python simple.py
python gauss2d.py
python gauss2d_plot.py