int offset = 1;
int mask = 1;
while (offset < THREAD_NUM)
{
if ((tid & mask) == 0)
{
shared[tid] += shared[tid + offset];
}
offset += offset;
mask += offset;
__syncthreads();
}
CUDA编程之树状加法
最新推荐文章于 2021-11-30 09:09:12 发布