CUDA Library - yszheda/wiki GitHub Wiki
thrust::cuda::par.on(stream)
- Getting CUDA Thrust to use a CUDA stream of your choice
- Thrust and streams
- https://thrust.github.io/doc/group__execution__policies.html
- https://github.com/thrust/thrust/blob/master/examples/cuda/simple_cuda_streams.cu
- Array of vectors using Thrust
- Is it possible to use thrust::device_vector and thrust::fill for 2D arrays using thrust library in CUDA
- how to cast a 2-dimensional thrust::device_vector<thrust::device_vector> to raw pointer
- https://thrust.github.io/doc/classthrust_1_1counting__iterator.html
- Purpose and usage of counting_iterators in CUDA Thrust library
Not needed in CUDA-8 thrust (1.8.1)
- NVIDIA NPP
- https://stackoverflow.com/questions/12422498/arent-npp-functions-completely-optimized
- NPP Dilate and Erode functions' examples
- Morphological Operations with CUDA
- nppiErode_8u_C1R from NPP library
- 4.1: cufftPlan2d(): X & Y params reversed?
- cufftPlan2d exception
- https://stackoverflow.com/questions/5529212/cuda-cufftplan2d-plan-size-question
- CUFFT error handling
- https://stackoverflow.com/questions/16267149/cufft-error-handling
- https://stackoverflow.com/questions/20847021/cufft-how-to-calculate-the-fft-when-the-input-is-a-pitched-array
- Unexpectedly low performance of cuFFT with half floating point (FP16)
- [!] Half precision cuFFT Transforms
- https://github.com/murphy17/YipLab-HoloDeconv/blob/master/src/main.cu
- https://github.com/mpicbg-scicomp/gearshifft/blob/bfa433497cee48243f96040c44b4ec2eb0b2a201/inc/libraries/cufft/cufft.hpp
// cuFFT API errors
static const char *_cudaGetErrorEnum(cufftResult error)
{
switch (error)
{
case CUFFT_SUCCESS:
return "CUFFT_SUCCESS";
case CUFFT_INVALID_PLAN:
return "CUFFT_INVALID_PLAN";
case CUFFT_ALLOC_FAILED:
return "CUFFT_ALLOC_FAILED";
case CUFFT_INVALID_TYPE:
return "CUFFT_INVALID_TYPE";
case CUFFT_INVALID_VALUE:
return "CUFFT_INVALID_VALUE";
case CUFFT_INTERNAL_ERROR:
return "CUFFT_INTERNAL_ERROR";
case CUFFT_EXEC_FAILED:
return "CUFFT_EXEC_FAILED";
case CUFFT_SETUP_FAILED:
return "CUFFT_SETUP_FAILED";
case CUFFT_INVALID_SIZE:
return "CUFFT_INVALID_SIZE";
case CUFFT_UNALIGNED_DATA:
return "CUFFT_UNALIGNED_DATA";
case CUFFT_INCOMPLETE_PARAMETER_LIST:
return "CUFFT_INCOMPLETE_PARAMETER_LIST";
case CUFFT_INVALID_DEVICE:
return "CUFFT_INVALID_DEVICE";
case CUFFT_PARSE_ERROR:
return "CUFFT_PARSE_ERROR";
case CUFFT_NO_WORKSPACE:
return "CUFFT_NO_WORKSPACE";
case CUFFT_NOT_IMPLEMENTED:
return "CUFFT_NOT_IMPLEMENTED";
case CUFFT_LICENSE_ERROR:
return "CUFFT_LICENSE_ERROR";
case CUFFT_NOT_SUPPORTED:
return "CUFFT_NOT_SUPPORTED";
}
return "<unknown>";
}
#define CUFFT_SAFE_CALL(call) { \
cufftResult_t err; \
if ((err = (call)) != CUFFT_SUCCESS) { \
fprintf(stderr, "Got error %d:%s at %s:%d\n", err, _cudaGetErrorEnum(err), \
__FILE__, __LINE__); \
exit(1); \
} \
}
- Error String for cufft?
- https://stackoverflow.com/questions/16267149/cufft-error-handling
- CUFFT error handling
CUDNN_CONVOLUTION_FWD_PREFER_FASTEST
In this configuration, the routine
cudnnGetConvolutionForwardAlgorithm()
will return the fastest algorithm regardless how much workspace is needed to execute it.
CUDNN_CONVOLUTION_FWD_SPECIFY_WORKSPACE_LIMIT
In this configuration, the routine
cudnnGetConvolutionForwardAlgorithm()
will return the fastest algorithm that fits within the memory limit that the user provided.
CUDNN_STATUS_EXECUTION_FAILED
The GPU program failed to execute. This is usually caused by a failure to launch some cuDNN kernel on the GPU, which can occur for multiple reasons.
To correct: check that the hardware, an appropriate version of the driver, and the cuDNN library are correctly installed.
Otherwise, this may indicate a internal error/bug in the library.
https://github.com/NVIDIA/cutlass
- CUDA Pro Tip: Generate Custom Application Profile Timelines with NVTX
- CUDA Pro Tip: Profiling MPI Applications
- https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/getting%20started.html#Defining-the-pipeline
- Case Study: ResNet50 with DALI
cuDNN静态链接CMake范例:
target_link_libraries(${OUTPUT_LIBRARY_NAME}
dl
rt
pthread
/usr/local/cuda/lib64/libcurand_static.a
/usr/local/cuda/lib64/libcublas_static.a
/usr/local/cuda/lib64/libcudnn_static.a
/usr/local/cuda/lib64/libculibos.a
/usr/local/cuda/lib64/libcudart_static.a
)
culibos
- CUDNN Static Linking Error
-
libcudart_static.a
在其他CUDA lib后面链接 - CUDA 6.0 Linking error: undefined reference to `__cudaUnregisterFatBinary'