编写代码
首先将上次的转灰度图的程序拷过来用于生成灰度图
共编写了cpu、gpu_wrong_naive、gpu_naive、gpu_usesharemem四种方式实现
cpu版本
cpu版本代码很简单:
void getGrayHistincpu(unsigned char * const grayData,
unsigned int * const hist,
uint imgheight,
uint imgwidth)
{
for(int i = 0; i < imgheight; i++)
{
for (int j = 0; j < imgwidth; j++)
{
hist[grayData[i*imgwidth+j]]++;
}
}
}
gpu版本1——直接照搬 gpu_wrong_naive
__global__ void getGrayHistincuda_wrong_naive(unsigned char * const grayData,
unsigned int * const hist,
uint imgheight,
uint imgwidth) //会发生冲突,数值每次会变化
{
const unsigned int idx = blockDim.x * blockIdx.x + threadIdx.x;
const unsigned int idy = blockDim.y * blockIdx.y + threadIdx.y;
if(idx < imgwidth && idy < imgheight)
{
const unsigned long pid = imgwidth * idy + idx;
const unsigned char value = grayData[pid];
hist[value]++;
}
}
这个代码有问题,因为各个线程会同时访问同一块全局内存,数值会不正确
gpu版本2——原子操作 gpu_naive
__global__ void getGrayHistincuda_naive(unsigned char * const grayData,
unsigned int * const hist,
uint imgheight,
uint imgwidth) //使用原子操作保证数值正确
{
const unsigned int idx = blockDim.x * blockIdx.x + threadIdx.x;
const unsigned int idy = blockDim.y * blockIdx.y + threadIdx.y;
if(idx < imgwidth && idy < imgheight)
{
const unsigned long pid = imgwidth * idy + idx;
const unsigned char value = grayData[pid];
atomicAdd(&(hist[value]), 1);