CUDA Library - yszheda/wiki GitHub Wiki

Thrust

Debug

CUDA stream

thrust::cuda::par.on(stream)

vector

host_vector

counting_iterator

Trouble-shooting

thrust::raw_reference_cast

Not needed in CUDA-8 thrust (1.8.1)

thrust::max_element crash

1.9.x: STATIC_ASSERTION_FAILURE

NPP

stream

example

CUB

cuFFT

cufftPlan2d

spRadix kernel1Mem

Error Handling

Half Precision FP16

Sample code

Error Handling

// cuFFT API errors
static const char *_cudaGetErrorEnum(cufftResult error)
{
    switch (error)
    {
        case CUFFT_SUCCESS:
            return "CUFFT_SUCCESS";

        case CUFFT_INVALID_PLAN:
            return "CUFFT_INVALID_PLAN";

        case CUFFT_ALLOC_FAILED:
            return "CUFFT_ALLOC_FAILED";

        case CUFFT_INVALID_TYPE:
            return "CUFFT_INVALID_TYPE";

        case CUFFT_INVALID_VALUE:
            return "CUFFT_INVALID_VALUE";

        case CUFFT_INTERNAL_ERROR:
            return "CUFFT_INTERNAL_ERROR";

        case CUFFT_EXEC_FAILED:
            return "CUFFT_EXEC_FAILED";

        case CUFFT_SETUP_FAILED:
            return "CUFFT_SETUP_FAILED";

        case CUFFT_INVALID_SIZE:
            return "CUFFT_INVALID_SIZE";

        case CUFFT_UNALIGNED_DATA:
            return "CUFFT_UNALIGNED_DATA";

        case CUFFT_INCOMPLETE_PARAMETER_LIST:
            return "CUFFT_INCOMPLETE_PARAMETER_LIST";

        case CUFFT_INVALID_DEVICE:
            return "CUFFT_INVALID_DEVICE";

        case CUFFT_PARSE_ERROR:
            return "CUFFT_PARSE_ERROR";

        case CUFFT_NO_WORKSPACE:
            return "CUFFT_NO_WORKSPACE";

        case CUFFT_NOT_IMPLEMENTED:
            return "CUFFT_NOT_IMPLEMENTED";

        case CUFFT_LICENSE_ERROR:
            return "CUFFT_LICENSE_ERROR";

        case CUFFT_NOT_SUPPORTED:
            return "CUFFT_NOT_SUPPORTED";
    }

    return "<unknown>";
}

#define CUFFT_SAFE_CALL(call) { \
    cufftResult_t err; \
    if ((err = (call)) != CUFFT_SUCCESS) { \
        fprintf(stderr, "Got error %d:%s at %s:%d\n", err, _cudaGetErrorEnum(err), \
                __FILE__, __LINE__); \
        exit(1); \
    } \
}

cuDPP

CuDNN


CUDNN_CONVOLUTION_FWD_PREFER_FASTEST

In this configuration, the routine cudnnGetConvolutionForwardAlgorithm() will return the fastest algorithm regardless how much workspace is needed to execute it.

CUDNN_CONVOLUTION_FWD_SPECIFY_​WORKSPACE_LIMIT

In this configuration, the routine cudnnGetConvolutionForwardAlgorithm() will return the fastest algorithm that fits within the memory limit that the user provided.

CUDNN_STATUS_EXECUTION_FAILED

The GPU program failed to execute. This is usually caused by a failure to launch some cuDNN kernel on the GPU, which can occur for multiple reasons.

To correct: check that the hardware, an appropriate version of the driver, and the cuDNN library are correctly installed.

Otherwise, this may indicate a internal error/bug in the library.


status == CUDNN_STATUS_SUCCESS (4 vs. 0) CUDNN_STATUS_INTERNAL_ERROR

CUTLASS

https://github.com/NVIDIA/cutlass

NVTX

NVML

TensorRT

Big number

DALI

static link CUDA libs

cuDNN静态链接CMake范例:

target_link_libraries(${OUTPUT_LIBRARY_NAME}
    dl
    rt
    pthread
    /usr/local/cuda/lib64/libcurand_static.a
    /usr/local/cuda/lib64/libcublas_static.a
    /usr/local/cuda/lib64/libcudnn_static.a
    /usr/local/cuda/lib64/libculibos.a
    /usr/local/cuda/lib64/libcudart_static.a
    )

undefined reference to `culibosInit'

undefined reference to shm_unlink' / undefined reference to shm_open'

undefined reference to `__cudaUnregisterFatBinary'

undefined reference to `dlopen'

⚠️ **GitHub.com Fallback** ⚠️