创建工程
使用cmake创建工程,CMakeLists.txt如下:
cmake_minimum_required(VERSION 2.8)
project(image_process)
find_package(OpenCV REQUIRED) #会去找FindXXX.cmake或XXXConfig.cmake,从而返回一些变量
find_package(CUDA REQUIRED) #REQUIRED代表如果找不到就会报错
cuda_add_executable(image_process main.cu)
target_link_libraries(image_process ${OpenCV_LIBS})
疑点尚未解决:cuda_add_executable是如何指定调用NVCC进行编译的,如何用其他方式制定nvcc编译
编写代码
代码思路很简单,就是用cuda、cpu、cv::cvtColor都运行一遍彩色图转灰度图的算法,对比一下运行时间
cuda 程序
每一个thread处理一个像素,线程网格与线程块设置如下:
dim3 threadsPerBlock(32, 32);
dim3 blocksPerGrid((imgwidth + threadsPerBlock.x - 1) / threadsPerBlock.x,
(imgheight + threadsPerBlock.y - 1) / threadsPerBlock.y);
kernel函数编写如下:
__global__ void rgb2grayincuda(uchar3 * const d_in, unsigned char * const d_out,
uint imgheight, uint imgwidth)
{
const unsigned int idx = blockIdx.x * blockDim.x + threadIdx.x;
const unsigned int idy = blockIdx.y * blockDim.y + threadIdx.y;
if (idx < imgwidth && idy < imgheight)
{
uchar3 rgb = d_in[idy * imgwidth + idx];
d_out[idy * imgwidth + idx] = 0.299f * rgb.x + 0.587f * rgb.y + 0.114f * rgb.z;
}
}
kernel函数比较tricky的一点是,对于不能被线程块