Thrust

Debug

https://github.com/thrust/thrust/wiki/Debugging

CUDA stream

thrust::cuda::par.on(stream)

vector

`host_vector`

difference between thrust::host_vector and std::vector?

`counting_iterator`

Trouble-shooting

`thrust::raw_reference_cast`

Not needed in CUDA-8 thrust (1.8.1)

`thrust::max_element` crash

1.9.x: `STATIC_ASSERTION_FAILURE`

https://github.com/thrust/thrust/issues/526

NPP

stream

Cuda, two streams created by a NPP function

example

https://github.com/jbmulligan/quip/blob/30c22dd7b9f29f9debbdd3a600665439029070b8/libsrc/cuda/cuda_npp.cpp

CUB

cuFFT

cufftPlan2d

spRadix kernel1Mem

https://stackoverflow.com/questions/35488348/cuda-cufft-api-behavior-in-concurrent-streams

Error Handling

Half Precision FP16

Sample code

Error Handling

// cuFFT API errors
static const char *_cudaGetErrorEnum(cufftResult error)
{
    switch (error)
    {
        case CUFFT_SUCCESS:
            return "CUFFT_SUCCESS";

        case CUFFT_INVALID_PLAN:
            return "CUFFT_INVALID_PLAN";

        case CUFFT_ALLOC_FAILED:
            return "CUFFT_ALLOC_FAILED";

        case CUFFT_INVALID_TYPE:
            return "CUFFT_INVALID_TYPE";

        case CUFFT_INVALID_VALUE:
            return "CUFFT_INVALID_VALUE";

        case CUFFT_INTERNAL_ERROR:
            return "CUFFT_INTERNAL_ERROR";

        case CUFFT_EXEC_FAILED:
            return "CUFFT_EXEC_FAILED";

        case CUFFT_SETUP_FAILED:
            return "CUFFT_SETUP_FAILED";

        case CUFFT_INVALID_SIZE:
            return "CUFFT_INVALID_SIZE";

        case CUFFT_UNALIGNED_DATA:
            return "CUFFT_UNALIGNED_DATA";

        case CUFFT_INCOMPLETE_PARAMETER_LIST:
            return "CUFFT_INCOMPLETE_PARAMETER_LIST";

        case CUFFT_INVALID_DEVICE:
            return "CUFFT_INVALID_DEVICE";

        case CUFFT_PARSE_ERROR:
            return "CUFFT_PARSE_ERROR";

        case CUFFT_NO_WORKSPACE:
            return "CUFFT_NO_WORKSPACE";

        case CUFFT_NOT_IMPLEMENTED:
            return "CUFFT_NOT_IMPLEMENTED";

        case CUFFT_LICENSE_ERROR:
            return "CUFFT_LICENSE_ERROR";

        case CUFFT_NOT_SUPPORTED:
            return "CUFFT_NOT_SUPPORTED";
    }

    return "<unknown>";
}

#define CUFFT_SAFE_CALL(call) { \
    cufftResult_t err; \
    if ((err = (call)) != CUFFT_SUCCESS) { \
        fprintf(stderr, "Got error %d:%s at %s:%d\n", err, _cudaGetErrorEnum(err), \
                __FILE__, __LINE__); \
        exit(1); \
    } \
}

`cuBLAS` / `cuBLASLt`

cuBLASLt notes

`cuDPP`

http://cudpp.github.io/

`CuDNN`

https://docs.nvidia.com/deeplearning/sdk/cudnn-developer-guide/index.html

CUDNN_CONVOLUTION_FWD_PREFER_FASTEST

In this configuration, the routine cudnnGetConvolutionForwardAlgorithm() will return the fastest algorithm regardless how much workspace is needed to execute it.

CUDNN_CONVOLUTION_FWD_SPECIFY_WORKSPACE_LIMIT

In this configuration, the routine cudnnGetConvolutionForwardAlgorithm() will return the fastest algorithm that fits within the memory limit that the user provided.

CUDNN_STATUS_EXECUTION_FAILED

The GPU program failed to execute. This is usually caused by a failure to launch some cuDNN kernel on the GPU, which can occur for multiple reasons.

To correct: check that the hardware, an appropriate version of the driver, and the cuDNN library are correctly installed.

Otherwise, this may indicate a internal error/bug in the library.

status == CUDNN_STATUS_SUCCESS (4 vs. 0) CUDNN_STATUS_INTERNAL_ERROR

CUTLASS CUTE

https://github.com/NVIDIA/cutlass

Layout

On CuTe layouts

reed的专栏 CUDA高性能编程: 高性能计算中CUDA软硬件体系

cute 之 MMA抽象

NVTX

NVML

Increase Performance with GPU Boost and K80 Autoboost

TensorRT

Big number

DALI

static link CUDA libs

How to link host code with a static CUDA library after separable compilation?

cuDNN静态链接CMake范例：

target_link_libraries(${OUTPUT_LIBRARY_NAME}
    dl
    rt
    pthread
    /usr/local/cuda/lib64/libcurand_static.a
    /usr/local/cuda/lib64/libcublas_static.a
    /usr/local/cuda/lib64/libcudnn_static.a
    /usr/local/cuda/lib64/libculibos.a
    /usr/local/cuda/lib64/libcudart_static.a
    )

undefined reference to `culibosInit'

culibos
CUDNN Static Linking Error

undefined reference to `shm_unlink' / undefined reference to` shm_open'

undefined reference to `__cudaUnregisterFatBinary'

libcudart_static.a在其他CUDA lib后面链接
CUDA 6.0 Linking error: undefined reference to `__cudaUnregisterFatBinary'

CUDA Library - yszheda/wiki GitHub Wiki

Thrust

Debug

CUDA stream

vector

`host_vector`

`counting_iterator`

Trouble-shooting

`thrust::raw_reference_cast`

`thrust::max_element` crash

1.9.x: `STATIC_ASSERTION_FAILURE`

NPP

stream

example

CUB

cuFFT

cufftPlan2d

spRadix kernel1Mem

Error Handling

Half Precision FP16

Sample code

Error Handling

`cuBLAS` / `cuBLASLt`

`cuDPP`

`CuDNN`

status == CUDNN_STATUS_SUCCESS (4 vs. 0) CUDNN_STATUS_INTERNAL_ERROR

CUTLASS CUTE

Layout

reed的专栏 CUDA高性能编程: 高性能计算中CUDA软硬件体系

NVTX

NVML

TensorRT

Big number

DALI

static link CUDA libs

undefined reference to `culibosInit'

undefined reference to `shm_unlink' / undefined reference to` shm_open'

undefined reference to `__cudaUnregisterFatBinary'

undefined reference to `dlopen'

⚠️ GitHub.com Fallback ⚠️

CUDA Library - yszheda/wiki GitHub Wiki

Thrust

Debug

CUDA stream

vector

host_vector

counting_iterator

Trouble-shooting

thrust::raw_reference_cast

thrust::max_element crash

1.9.x: STATIC_ASSERTION_FAILURE

NPP

stream

example

CUB

cuFFT

cufftPlan2d

spRadix kernel1Mem

Error Handling

Half Precision FP16

Sample code

Error Handling

cuBLAS / cuBLASLt

cuDPP

CuDNN

status == CUDNN_STATUS_SUCCESS (4 vs. 0) CUDNN_STATUS_INTERNAL_ERROR

CUTLASS CUTE

Layout

reed的专栏 CUDA高性能编程: 高性能计算中CUDA软硬件体系

NVTX

NVML

TensorRT

Big number

DALI

static link CUDA libs

undefined reference to `culibosInit'

undefined reference to shm_unlink' / undefined reference to shm_open'

undefined reference to `__cudaUnregisterFatBinary'

undefined reference to `dlopen'

⚠️ **GitHub.com Fallback** ⚠️

`host_vector`

`counting_iterator`

`thrust::raw_reference_cast`

`thrust::max_element` crash

1.9.x: `STATIC_ASSERTION_FAILURE`

`cuBLAS` / `cuBLASLt`

`cuDPP`

`CuDNN`

undefined reference to `shm_unlink' / undefined reference to` shm_open'

⚠️ GitHub.com Fallback ⚠️