GPU(五)CUDA线程模型

上一篇文章我们有这么一个简单的CUDA hello world程序:

#include<stdio.h>

__global__ void hello_world()
{
        printf("hello world\n");
}

int main()
{
        hello_world<<<2, 2>>>();
        cudaDeviceSynchronize();
        return 0;
}

并且提到了CUDA编程中的一个重要概念核函数:核函数是在主机(CPU)代码中调用启动,在设备(GPU)上起线程执行的代码逻辑。而GPU起多少线程执行该核函数,是在调用核函数时通过<<<2, 2>>>这种形式指定的,学习如何组织GPU线程是学习CUDA编程的一个重要部分,而这些的基础就是CUDA线程模型。

1、CUDA线程模型

CUDA线程模型包括三个层次:线程块(Block)线程网格(Grid)线程(Thread)

  • 线程(Thread) :线程是执行的基本单位,是实际执行计算的操作对象。每个线程有一个唯一的线程ID,通过这个ID,线程可以定位到自己在块中的位置。CUDA的线程模型基于二维的线程块和三维的线程网格,通过这两个层次的组织,可以实现大规模的并行计算。
  • 线程块(Block) :线程块是并行执行的线程集合,线程块内的线程共享一些资源,比如共享内存。线程块的大小可以在编译时确定,也可以在运行时动态设定。每个线程块都有一个唯一的块ID,通过这个ID,线程可以定位到自己在块中的位置。
  • 网格(Grid) :网格是一组线程块的集合。同一个线程网格中的线程块共享一些资源,比如全局内存。线程网格的大小可以在编译时确定,也可以在运行时动态设定。每个线程网格都有一个唯一的网格ID,通过这个ID,线程可以定位到自己在网格中的位置。

注意: 线程块是逻辑上的划分,物理上不分块

在这里插入图片描述

1.1 一维线程模型

1.1.1 定义方式

一维线程模型定义方式: <<<grid_size, block_size>>> 。其中,grid_size指定网格中线程块的数量,block_size指定一个线程块中线程的数量,该线程模型中线程的总数为grid_size*block_size。

1.1.2 示例代码

我们定义一个线程块数量为2,每个线程块中线程数为3的线程模型<<<2, 3>>>,该线程模型的线程的总数为2*3=6:

// dim1.cu
#include<stdio.h>

__global__ void hello_world()
{
        printf("hello world\n");
}

int main()
{

        hello_world<<<2, 3>>>();
        cudaDeviceSynchronize();
        return 0;
}

编译执行:

$ nvcc dim1.cu -o dim1
$ ./dim1
hello world
hello world
hello world
hello world
hello world
hello world

1.1.3 线程索引

在CUDA编程中,有一些内建变量(Built-in Variable)供编程人员使用,其中,本章节需要用到的内建变量有:

  1. 所谓的内建变量就是无需在CUDA程序中声明就可以直接使用的变量
  2. 内建变量只在核函数中有效
  • gridDim.x :对应grid_size,表示网格大小
  • blockDim.x :对应block_size,表示线程块中线程的数量
  • blockIdx.x :线程块索引,取值范围0~gridDim.x-1
  • threadIdx.x :线程索引,取值范围0~blockDIm.x-1

结合如下图所示的一维线程模型示意图,我们可以得出线程在该线程模型中的全局索引idx为:idx = threadIdx.x + blockIdx.x * blockDim.x

在这里插入图片描述

修改下上面的程序:

// dim1.cu
#include<stdio.h>

__global__ void hello_world()
{
        const int grid_size = gridDim.x;
        const int block_size = blockDim.x;
        const int idx = threadIdx.x + blockIdx.x*blockDim.x;
        printf("hello world from idx: %d, grid_size: %d, block_size: %d\n", idx, grid_size, block_size);
}

int main()
{

        hello_world<<<2, 3>>>();
        cudaDeviceSynchronize();
        return 0;
}

编译执行:

$ nvcc dim1.cu -o dim1
$ ./dim1
hello world from idx: 3, grid_size: 2, block_size: 3
hello world from idx: 4, grid_size: 2, block_size: 3
hello world from idx: 5, grid_size: 2, block_size: 3
hello world from idx: 0, grid_size: 2, block_size: 3
hello world from idx: 1, grid_size: 2, block_size: 3
hello world from idx: 2, grid_size: 2, block_size: 3

1.2 内建类型

在上文一维线程模型中提到blockIDx和threadIdx,其实它们都是类型为uint3的内建变量,在定义上可以理解为:

struct uint3 {
  uint x;
  uint y;
  uint z;
}

struct uint3 blockIdx;
struct uint3 threadIdx;

gridDim和blockDim类型是dim3的变量,和blockIDx、threadIdx定义类似,也有x、y、z三个字段,在定义上可以理解为:

struct dim3 {
  uint x;
  uint y;
  uint z;
}

struct dim3 gridDim;
struct dim3 blockDim;

取值范围也比较好理解:

  • blockIdx.x范围:[0, gridDim.x)
  • blockIdx.y范围:[0, gridDim.y)
  • blockIdx.z范围:[0, gridDim.z)
  • threadIdx.x范围:[0, blockDim.x)
  • threadIdx.y范围:[0, blockDim.y)
  • threadIdx.z范围:[0, blockDim.z)

再回到<<<grid_size, block_size>>>这种定义线程模型的方式,结合c++的构造函数语法,多维线程模型可写为:

dim3 grid_size(x, y, z);
dim3 block_size(x, y, z);

kernel_fn<<<grid_size, block_size>>>();

一维线程模型中的<<<2, 3>>>等价于

dim3 grid_size(2, 1, 1);
dim3 block_size(3, 1, 1);

kernel_fn<<<grid_size, block_size>>>();

1.3 二维线程模型

1.3.1 定义方式

有了上面的基础,要定义一个二维线程模型就很简单了,例如我们要定义一个网格grid大小是2 * 2,线程块block大小是3 * 3的线程块,就可以这么写:

// dim2.cu
#include<stdio.h>

__global__ void hello_world()
{
        printf("hello world\n");
}

int main()
{
        dim3 grid_size(2, 2); // 等价于dim3 grid_size(2, 2, 1)
        dim3 block_size(3, 3); // dim3 block_size(3, 3, 1)

        hello_world<<<grid_size, block_size>>>();
        cudaDeviceSynchronize();
        return 0;
}

理论上该二维线程模型的线程数量为(2 * 2)*(3 * 3)= 36个,编译验证一下:

$ nvcc dim2.cu -o dim2
$ ./dim2
hello world
hello world
hello world
hello world
hello world
hello world
hello world
hello world
hello world
hello world
hello world
hello world
hello world
hello world
hello world
hello world
hello world
hello world
hello world
hello world
hello world
hello world
hello world
hello world
hello world
hello world
hello world
hello world
hello world
hello world
hello world
hello world
hello world
hello world
hello world
hello world

1.3.2 线程索引

一维线程模型全局索引计算方式为idx = threadIdx.x + blockIdx.x * blockDim.x,结合二维逻辑概念,二维线程模型全局索引idx计算方式为:

blockId = blockIdx.x + blockIdx.y*gridDim.x

threadId = threadIdx.y*blockDim.x + threadIdx.x

idx = blockId*(blockDim.x*blockDim.y) + threadId

修改程序:

// dim2.cu
#include<stdio.h>

__global__ void hello_world()
{
        const int blockId = blockIdx.x + blockIdx.y*gridDim.x;
        const int threadId = threadIdx.y*blockDim.x + threadIdx.x;
        const int idx = blockId*(blockDim.x*blockDim.y) + threadId;
        printf("hello world from idx: %d, blockId: %d, threadId: %d\n", idx, blockId, threadId);
}

int main()
{
        dim3 grid_size(2, 2);
        dim3 block_size(3, 3);

        hello_world<<<grid_size, block_size>>>();
        cudaDeviceSynchronize();
        return 0;
}

编译运行:

$ nvcc dim2.cu -o dim2
$ ./dim2
hello world from idx: 9, blockId: 1, threadId: 0
hello world from idx: 10, blockId: 1, threadId: 1
hello world from idx: 11, blockId: 1, threadId: 2
hello world from idx: 12, blockId: 1, threadId: 3
hello world from idx: 13, blockId: 1, threadId: 4
hello world from idx: 14, blockId: 1, threadId: 5
hello world from idx: 15, blockId: 1, threadId: 6
hello world from idx: 16, blockId: 1, threadId: 7
hello world from idx: 17, blockId: 1, threadId: 8
hello world from idx: 27, blockId: 3, threadId: 0
hello world from idx: 28, blockId: 3, threadId: 1
hello world from idx: 29, blockId: 3, threadId: 2
hello world from idx: 30, blockId: 3, threadId: 3
hello world from idx: 31, blockId: 3, threadId: 4
hello world from idx: 32, blockId: 3, threadId: 5
hello world from idx: 33, blockId: 3, threadId: 6
hello world from idx: 34, blockId: 3, threadId: 7
hello world from idx: 35, blockId: 3, threadId: 8
hello world from idx: 0, blockId: 0, threadId: 0
hello world from idx: 1, blockId: 0, threadId: 1
hello world from idx: 2, blockId: 0, threadId: 2
hello world from idx: 3, blockId: 0, threadId: 3
hello world from idx: 4, blockId: 0, threadId: 4
hello world from idx: 5, blockId: 0, threadId: 5
hello world from idx: 6, blockId: 0, threadId: 6
hello world from idx: 7, blockId: 0, threadId: 7
hello world from idx: 8, blockId: 0, threadId: 8
hello world from idx: 18, blockId: 2, threadId: 0
hello world from idx: 19, blockId: 2, threadId: 1
hello world from idx: 20, blockId: 2, threadId: 2
hello world from idx: 21, blockId: 2, threadId: 3
hello world from idx: 22, blockId: 2, threadId: 4
hello world from idx: 23, blockId: 2, threadId: 5
hello world from idx: 24, blockId: 2, threadId: 6
hello world from idx: 25, blockId: 2, threadId: 7
hello world from idx: 26, blockId: 2, threadId: 8

1.4 三维线程模型

1.4.1 定义方式

参考前文,如果想定义一个网格大小为2 * 2 * 2,线程块大小为3 * 3 * 3的三维线程模型,可以如下定义:

dim3 grid_size(2, 2, 2);
dim3 block_size(3, 3, 3);

kernel_fn<<<grid_size, block_size>>>();

1.4.2 线程索引

相比一维和二维线程模型,三维线程模型的索引idx计算又更复杂些:

blockId = blockIdx.x + blockIdx.y*gridDim.x + gridDim.x*gridDim.y*blockIdx.z

threadId = threadIdx.z*blockDim.x*blockDim.y + threadIdx.y*blockDim.x + threadIdx.x

idx = blockId*blockDim.x*blockDim.y*blockDim.z + threadId

对应代码为:

// dim3.cu
#include<stdio.h>

__global__ void hello_world()
{
        const int blockId = blockIdx.x + blockIdx.y*gridDim.x + gridDim.x*gridDim.y*blockIdx.z;
        const int threadId = threadIdx.z*blockDim.x*blockDim.y + threadIdx.y*blockDim.x + threadIdx.x;
        const int idx = blockId*blockDim.x*blockDim.y*blockDim.z + threadId;
        printf("hello world from idx: %d, blockId: %d, threadId: %d\n", idx, blockId, threadId);
}

int main()
{
        dim3 grid_size(2, 2, 2);
        dim3 block_size(3, 3, 3);

        hello_world<<<grid_size, block_size>>>();
        cudaDeviceSynchronize();
        return 0;
}

编译运行,共有(2 * 2 * 2) * (3 * 3 * 3)= 216个线程:

$ nvcc dim3.cu -o dim3
$ ./dim3
hello world from idx: 189, blockId: 7, threadId: 0
hello world from idx: 190, blockId: 7, threadId: 1
hello world from idx: 191, blockId: 7, threadId: 2
hello world from idx: 192, blockId: 7, threadId: 3
hello world from idx: 193, blockId: 7, threadId: 4
hello world from idx: 194, blockId: 7, threadId: 5
hello world from idx: 195, blockId: 7, threadId: 6
hello world from idx: 196, blockId: 7, threadId: 7
hello world from idx: 197, blockId: 7, threadId: 8
hello world from idx: 198, blockId: 7, threadId: 9
hello world from idx: 199, blockId: 7, threadId: 10
hello world from idx: 200, blockId: 7, threadId: 11
hello world from idx: 201, blockId: 7, threadId: 12
hello world from idx: 202, blockId: 7, threadId: 13
hello world from idx: 203, blockId: 7, threadId: 14
hello world from idx: 204, blockId: 7, threadId: 15
hello world from idx: 205, blockId: 7, threadId: 16
hello world from idx: 206, blockId: 7, threadId: 17
hello world from idx: 207, blockId: 7, threadId: 18
hello world from idx: 208, blockId: 7, threadId: 19
hello world from idx: 209, blockId: 7, threadId: 20
hello world from idx: 210, blockId: 7, threadId: 21
hello world from idx: 211, blockId: 7, threadId: 22
hello world from idx: 212, blockId: 7, threadId: 23
hello world from idx: 213, blockId: 7, threadId: 24
hello world from idx: 214, blockId: 7, threadId: 25
hello world from idx: 215, blockId: 7, threadId: 26
hello world from idx: 108, blockId: 4, threadId: 0
hello world from idx: 109, blockId: 4, threadId: 1
hello world from idx: 110, blockId: 4, threadId: 2
hello world from idx: 111, blockId: 4, threadId: 3
hello world from idx: 112, blockId: 4, threadId: 4
hello world from idx: 113, blockId: 4, threadId: 5
hello world from idx: 114, blockId: 4, threadId: 6
hello world from idx: 115, blockId: 4, threadId: 7
hello world from idx: 116, blockId: 4, threadId: 8
hello world from idx: 117, blockId: 4, threadId: 9
hello world from idx: 118, blockId: 4, threadId: 10
hello world from idx: 119, blockId: 4, threadId: 11
hello world from idx: 120, blockId: 4, threadId: 12
hello world from idx: 121, blockId: 4, threadId: 13
hello world from idx: 122, blockId: 4, threadId: 14
hello world from idx: 123, blockId: 4, threadId: 15
hello world from idx: 124, blockId: 4, threadId: 16
hello world from idx: 125, blockId: 4, threadId: 17
hello world from idx: 126, blockId: 4, threadId: 18
hello world from idx: 127, blockId: 4, threadId: 19
hello world from idx: 128, blockId: 4, threadId: 20
hello world from idx: 129, blockId: 4, threadId: 21
hello world from idx: 130, blockId: 4, threadId: 22
hello world from idx: 131, blockId: 4, threadId: 23
hello world from idx: 132, blockId: 4, threadId: 24
hello world from idx: 133, blockId: 4, threadId: 25
hello world from idx: 134, blockId: 4, threadId: 26
hello world from idx: 162, blockId: 6, threadId: 0
hello world from idx: 163, blockId: 6, threadId: 1
hello world from idx: 164, blockId: 6, threadId: 2
hello world from idx: 165, blockId: 6, threadId: 3
hello world from idx: 166, blockId: 6, threadId: 4
hello world from idx: 167, blockId: 6, threadId: 5
hello world from idx: 168, blockId: 6, threadId: 6
hello world from idx: 169, blockId: 6, threadId: 7
hello world from idx: 170, blockId: 6, threadId: 8
hello world from idx: 171, blockId: 6, threadId: 9
hello world from idx: 172, blockId: 6, threadId: 10
hello world from idx: 173, blockId: 6, threadId: 11
hello world from idx: 174, blockId: 6, threadId: 12
hello world from idx: 175, blockId: 6, threadId: 13
hello world from idx: 176, blockId: 6, threadId: 14
hello world from idx: 177, blockId: 6, threadId: 15
hello world from idx: 178, blockId: 6, threadId: 16
hello world from idx: 179, blockId: 6, threadId: 17
hello world from idx: 180, blockId: 6, threadId: 18
hello world from idx: 181, blockId: 6, threadId: 19
hello world from idx: 182, blockId: 6, threadId: 20
hello world from idx: 183, blockId: 6, threadId: 21
hello world from idx: 184, blockId: 6, threadId: 22
hello world from idx: 185, blockId: 6, threadId: 23
hello world from idx: 186, blockId: 6, threadId: 24
hello world from idx: 187, blockId: 6, threadId: 25
hello world from idx: 188, blockId: 6, threadId: 26
hello world from idx: 27, blockId: 1, threadId: 0
hello world from idx: 28, blockId: 1, threadId: 1
hello world from idx: 29, blockId: 1, threadId: 2
hello world from idx: 30, blockId: 1, threadId: 3
hello world from idx: 31, blockId: 1, threadId: 4
hello world from idx: 32, blockId: 1, threadId: 5
hello world from idx: 33, blockId: 1, threadId: 6
hello world from idx: 34, blockId: 1, threadId: 7
hello world from idx: 35, blockId: 1, threadId: 8
hello world from idx: 36, blockId: 1, threadId: 9
hello world from idx: 37, blockId: 1, threadId: 10
hello world from idx: 38, blockId: 1, threadId: 11
hello world from idx: 39, blockId: 1, threadId: 12
hello world from idx: 40, blockId: 1, threadId: 13
hello world from idx: 41, blockId: 1, threadId: 14
hello world from idx: 42, blockId: 1, threadId: 15
hello world from idx: 43, blockId: 1, threadId: 16
hello world from idx: 44, blockId: 1, threadId: 17
hello world from idx: 45, blockId: 1, threadId: 18
hello world from idx: 46, blockId: 1, threadId: 19
hello world from idx: 47, blockId: 1, threadId: 20
hello world from idx: 48, blockId: 1, threadId: 21
hello world from idx: 49, blockId: 1, threadId: 22
hello world from idx: 50, blockId: 1, threadId: 23
hello world from idx: 51, blockId: 1, threadId: 24
hello world from idx: 52, blockId: 1, threadId: 25
hello world from idx: 53, blockId: 1, threadId: 26
hello world from idx: 81, blockId: 3, threadId: 0
hello world from idx: 82, blockId: 3, threadId: 1
hello world from idx: 83, blockId: 3, threadId: 2
hello world from idx: 84, blockId: 3, threadId: 3
hello world from idx: 85, blockId: 3, threadId: 4
hello world from idx: 86, blockId: 3, threadId: 5
hello world from idx: 87, blockId: 3, threadId: 6
hello world from idx: 88, blockId: 3, threadId: 7
hello world from idx: 89, blockId: 3, threadId: 8
hello world from idx: 90, blockId: 3, threadId: 9
hello world from idx: 91, blockId: 3, threadId: 10
hello world from idx: 92, blockId: 3, threadId: 11
hello world from idx: 93, blockId: 3, threadId: 12
hello world from idx: 94, blockId: 3, threadId: 13
hello world from idx: 95, blockId: 3, threadId: 14
hello world from idx: 96, blockId: 3, threadId: 15
hello world from idx: 97, blockId: 3, threadId: 16
hello world from idx: 98, blockId: 3, threadId: 17
hello world from idx: 99, blockId: 3, threadId: 18
hello world from idx: 100, blockId: 3, threadId: 19
hello world from idx: 101, blockId: 3, threadId: 20
hello world from idx: 102, blockId: 3, threadId: 21
hello world from idx: 103, blockId: 3, threadId: 22
hello world from idx: 104, blockId: 3, threadId: 23
hello world from idx: 105, blockId: 3, threadId: 24
hello world from idx: 106, blockId: 3, threadId: 25
hello world from idx: 107, blockId: 3, threadId: 26
hello world from idx: 135, blockId: 5, threadId: 0
hello world from idx: 136, blockId: 5, threadId: 1
hello world from idx: 137, blockId: 5, threadId: 2
hello world from idx: 138, blockId: 5, threadId: 3
hello world from idx: 139, blockId: 5, threadId: 4
hello world from idx: 140, blockId: 5, threadId: 5
hello world from idx: 141, blockId: 5, threadId: 6
hello world from idx: 142, blockId: 5, threadId: 7
hello world from idx: 143, blockId: 5, threadId: 8
hello world from idx: 144, blockId: 5, threadId: 9
hello world from idx: 145, blockId: 5, threadId: 10
hello world from idx: 146, blockId: 5, threadId: 11
hello world from idx: 147, blockId: 5, threadId: 12
hello world from idx: 148, blockId: 5, threadId: 13
hello world from idx: 149, blockId: 5, threadId: 14
hello world from idx: 150, blockId: 5, threadId: 15
hello world from idx: 151, blockId: 5, threadId: 16
hello world from idx: 152, blockId: 5, threadId: 17
hello world from idx: 153, blockId: 5, threadId: 18
hello world from idx: 154, blockId: 5, threadId: 19
hello world from idx: 155, blockId: 5, threadId: 20
hello world from idx: 156, blockId: 5, threadId: 21
hello world from idx: 157, blockId: 5, threadId: 22
hello world from idx: 158, blockId: 5, threadId: 23
hello world from idx: 159, blockId: 5, threadId: 24
hello world from idx: 160, blockId: 5, threadId: 25
hello world from idx: 161, blockId: 5, threadId: 26
hello world from idx: 0, blockId: 0, threadId: 0
hello world from idx: 1, blockId: 0, threadId: 1
hello world from idx: 2, blockId: 0, threadId: 2
hello world from idx: 3, blockId: 0, threadId: 3
hello world from idx: 4, blockId: 0, threadId: 4
hello world from idx: 5, blockId: 0, threadId: 5
hello world from idx: 6, blockId: 0, threadId: 6
hello world from idx: 7, blockId: 0, threadId: 7
hello world from idx: 8, blockId: 0, threadId: 8
hello world from idx: 9, blockId: 0, threadId: 9
hello world from idx: 10, blockId: 0, threadId: 10
hello world from idx: 11, blockId: 0, threadId: 11
hello world from idx: 12, blockId: 0, threadId: 12
hello world from idx: 13, blockId: 0, threadId: 13
hello world from idx: 14, blockId: 0, threadId: 14
hello world from idx: 15, blockId: 0, threadId: 15
hello world from idx: 16, blockId: 0, threadId: 16
hello world from idx: 17, blockId: 0, threadId: 17
hello world from idx: 18, blockId: 0, threadId: 18
hello world from idx: 19, blockId: 0, threadId: 19
hello world from idx: 20, blockId: 0, threadId: 20
hello world from idx: 21, blockId: 0, threadId: 21
hello world from idx: 22, blockId: 0, threadId: 22
hello world from idx: 23, blockId: 0, threadId: 23
hello world from idx: 24, blockId: 0, threadId: 24
hello world from idx: 25, blockId: 0, threadId: 25
hello world from idx: 26, blockId: 0, threadId: 26
hello world from idx: 54, blockId: 2, threadId: 0
hello world from idx: 55, blockId: 2, threadId: 1
hello world from idx: 56, blockId: 2, threadId: 2
hello world from idx: 57, blockId: 2, threadId: 3
hello world from idx: 58, blockId: 2, threadId: 4
hello world from idx: 59, blockId: 2, threadId: 5
hello world from idx: 60, blockId: 2, threadId: 6
hello world from idx: 61, blockId: 2, threadId: 7
hello world from idx: 62, blockId: 2, threadId: 8
hello world from idx: 63, blockId: 2, threadId: 9
hello world from idx: 64, blockId: 2, threadId: 10
hello world from idx: 65, blockId: 2, threadId: 11
hello world from idx: 66, blockId: 2, threadId: 12
hello world from idx: 67, blockId: 2, threadId: 13
hello world from idx: 68, blockId: 2, threadId: 14
hello world from idx: 69, blockId: 2, threadId: 15
hello world from idx: 70, blockId: 2, threadId: 16
hello world from idx: 71, blockId: 2, threadId: 17
hello world from idx: 72, blockId: 2, threadId: 18
hello world from idx: 73, blockId: 2, threadId: 19
hello world from idx: 74, blockId: 2, threadId: 20
hello world from idx: 75, blockId: 2, threadId: 21
hello world from idx: 76, blockId: 2, threadId: 22
hello world from idx: 77, blockId: 2, threadId: 23
hello world from idx: 78, blockId: 2, threadId: 24
hello world from idx: 79, blockId: 2, threadId: 25
hello world from idx: 80, blockId: 2, threadId: 26

2、线程模型进阶

2.1 限制

现成块的大小虽然是逻辑上的概念,但是在实际运行中还是需要底层硬件资源的支持,因此早期CUDA架构中,通常限制一个线程块最多1024个线程。在新版本CUDA中,线程块的大小限制已经提高,但是这些限制依赖于特定的硬件。可以通过查询CUDA属性来获取当前硬件支持的最大线程块大小:

// max_thread.cu
#include<stdio.h>

int main()
{
        cudaDeviceProp prop;
        cudaGetDeviceProperties(&prop, 0);
        int maxThreadsPerBlock = prop.maxThreadsDim[0]; // 线程块的最大线程数
        printf("maxThreadsPerBlock: %d\n", maxThreadsPerBlock);

        return 0;
}

编译运行:

$ nvcc max_thread.cu -o cuda
$ ./cuda
maxThreadsPerBlock: 1024

2.2 压榨GPU算力

线程是CUDA GPU编程中的最小单位,线程的数量可以远远高于GPU核心数量。一个GPU可能有成百上千的计算核心,线程数大于或等于GPU核心数时,才能更充分的利用GPU计算资源,所以提升GPU利用率,也就是减少GPU的空闲时间,让计算核心一直处于计算中。

2.3 参考资料

https://users.wfu.edu/choss/CUDA/docs/Lecture%205.pdf
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html

微信公众号卡巴斯同步发布,欢迎大家关注。

  • 14
    点赞
  • 13
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值