CUDA 的软件架构由网格 (Grid), 线程块 (Block) 和线程 (Thread) 相关, 相当于把 GPU 上的计算单元分为若干网格,每个网格内包含若干个线程块,每个线程块包含若干的线程.
下面介绍 Thread, Block, Grid 的概念
- thread: 一个 CUDA 的并行程序会被许多个
threads
来执行. - block: 数个
threads
会被群组成一个block
, 同一个block
中的threads
可以同步,也可以通过shared memory 通讯
. - grid: 多个blocks 则会构成
grid
.
一个 Grid
可以包含多个Blocks
, Blocks
的组织方式可以是一维的,二维或者三维, Bolock
包含多个 threads
这些thread
的组织方式也是一维二维三维的。
-
CUDA 每个线程都有唯一的标识,
ID——threadIdx
-
grid 划分成1维,block 划分成 1 维
int threadId = blockIdx.x *blockDim.x + threadIdx.x;
-
grid划分成1维,block划分为2维
int threadId = blockIdx.x * blockDim.x * blockDim.y+ threadIdx.y * blockDim.x + threadIdx.x;
-
grid划分成1维,block划分为3维
int threadId = blockIdx.x * blockDim.x * blockDim.y * blockDim.z + threadIdx.z * blockDim.y * blockDim.x + threadIdx.y * blockDim.x + threadIdx.x;
-
grid划分成2维,block划分为1维
int blockId = blockIdx.y * gridDim.x + blockIdx.x;
int threadId = blockId * blockDim.x + threadIdx.x;
-
grid划分成2维,block划分为2维
int blockId = blockIdx.x + blockIdx.y * gridDim.x;
int threadId = blockId * (blockDim.x * blockDim.y) + (threadIdx.y * blockDim.x) + threadIdx.x;
-
grid划分成2维,block划分为3维
int blockId = blockIdx.x + blockIdx.y * gridDim.x; int threadId = blockId * (blockDim.x * blockDim.y * blockDim.z) + (threadIdx.z * (blockDim.x * blockDim.y)) + (threadIdx.y * blockDim.x) + threadIdx.x;
-
grid划分成3维,block划分为1维
int blockId = blockIdx.x + blockIdx.y * gridDim.x + gridDim.x * gridDim.y * blockIdx.z; int threadId = blockId * blockDim.x + threadIdx.x;
-
grid划分成3维,block划分为2维
int blockId = blockIdx.x + blockIdx.y * gridDim.x + gridDim.x * gridDim.y * blockIdx.z; int threadId = blockId * (blockDim.x * blockDim.y) + (threadIdx.y * blockDim.x) + threadIdx.x;
-
grid划分成3维,block划分为3维
int blockId = blockIdx.x + blockIdx.y * gridDim.x + gridDim.x * gridDim.y * blockIdx.z; int threadId = blockId * (blockDim.x * blockDim.y * blockDim.z) + (threadIdx.z * (blockDim.x * blockDim.y)) + (threadIdx.y * blockDim.x) + threadIdx.x;