too many blocks in cooperative launch at cudaLaunchCooperativeKernel

在使用cudaLaunchCooperativeKernel时出现:

cudaErrorCooperativeLaunchTooLarge (error 82) due to “too many blocks in cooperative launch” on CUDA API call to cudaLaunchCooperativeKernel.

问题如下:

I understand I’m using too many ‘active blocks’ and have no argument with that.

What I don’t understand is how to do the math to know how many blocks and threads I can call beforehand.

在使用cudaLaunchCooperativeKernel时,限制其最大grid_dim和block_dim的元素是什么?

A100的关键参数:
在这里插入图片描述
从上面表格中可以看到,影响cooperative launch的max_grid_dim 和max_block_dim的因素有三个:

maximum number of resident blocks per SM
maximum number of resident warps per SM
maximum number of resident threads per SM

对于A100理论上来说,在cooperative launch的时候,有如下限制(寄存器等先忽略):

block 不能超过 10832=3456
warps不能超过108
64=6912
threads 不能超过 108*2048=221184
blocks内的threads不能跨SM (这个是隐含的条件,很多人忘了这个,所以失败)
按照上面条件获得下表,理论上下面表格数据是能cooperative launch成功的,
计算原则就是:
grid_dim * block_dim == 108 *2048,(这个计算没有考虑blocks内的threads不能跨SM)

grid dimmax block dimthreads sumwarp sum
11024102432
21024204864
410244096128
810248192256
16102416384512
321024327681024
641024655362048
12810241310724096
256864(672)221184(172032)6912(5376)
512432(384)221184(196608)6912 (6144)
1024216(192)221184(196608)6912(6144)
2048108(96)221184(196608)6912(6144)
345664(64)221184(221184)6912(6912)

按照上面计算,遍历跑一遍,结果发现实测值和我们上面的计算值不一样,下面表格中的smx列中的值代表,这个SM中有多少个threads
sm上kernel分布:

grid_dimblock_dimsm0sm1sm2sm3sm4sm5sm6sm7sm8sm9sm10sm11sm12sm13sm14sm15sm16sm17sm18sm19sm20sm21sm22sm23sm24sm25sm26sm27sm28sm29sm30sm31sm32sm33sm34sm35sm36sm37sm38sm39sm40sm41sm42sm43sm44sm45sm46sm47sm48sm49sm50sm51sm52sm53sm54sm55sm56sm57sm58sm59sm60sm61sm62sm63sm64sm65sm66sm67sm68sm69sm70sm71sm72sm73sm74sm75sm76sm77sm78sm79sm80sm81sm82sm83sm84sm85sm86sm87sm88sm89sm90sm91sm92sm93sm94sm95sm96sm97sm98sm99sm100sm101sm102sm103sm104sm105sm106sm107
11100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
1646400000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
112812800000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
125625600000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
151251200000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
11024102400000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
21101000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
26464064000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
21281280128000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
22562560256000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
25125120512000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
21024102401024000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
41101010100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
4646406406406400000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
412812801280128012800000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
425625602560256025600000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
451251205120512051200000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
41024102401024010240102400000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
81101010101010101000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
86464064064064064064064064000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
81281280128012801280128012801280128000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
82562560256025602560256025602560256000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
85125120512051205120512051205120512000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
81024102401024010240102401024010240102401024000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
161101010101010101010101010101010100000000000000000000000000000000000000000000000000000000000000000000000000000
16646406406406406406406406406406406406406406406406400000000000000000000000000000000000000000000000000000000000000000000000000000
1612812801280128012801280128012801280128012801280128012801280128012800000000000000000000000000000000000000000000000000000000000000000000000000000
1625625602560256025602560256025602560256025602560256025602560256025600000000000000000000000000000000000000000000000000000000000000000000000000000
1651251205120512051205120512051205120512051205120512051205120512051200000000000000000000000000000000000000000000000000000000000000000000000000000
161024102401024010240102401024010240102401024010240102401024010240102401024010240102400000000000000000000000000000000000000000000000000000000000000000000000000000
321101010101010101010101010101010101010101010101010101010101010101000000000000000000000000000000000000000000000
326464064064064064064064064064064064064064064064064064064064064064064064064064064064064064064064064000000000000000000000000000000000000000000000
321281280128012801280128012801280128012801280128012801280128012801280128012801280128012801280128012801280128012801280128012801280128000000000000000000000000000000000000000000000
322562560256025602560256025602560256025602560256025602560256025602560256025602560256025602560256025602560256025602560256025602560256000000000000000000000000000000000000000000000
325125120512051205120512051205120512051205120512051205120512051205120512051205120512051205120512051205120512051205120512051205120512000000000000000000000000000000000000000000000
321024102401024010240102401024010240102401024010240102401024010240102401024010240102401024010240102401024010240102401024010240102401024010240102401024010240102401024000000000000000000000000000000000000000000000
641111111111111111111111010101010101010101010101010101010101010101010101010101010101010101010101010101010101010
64646464646464646464646464646464646464646464640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640
6412812812812812812812812812812812812812812812812812812812812812812801280128012801280128012801280128012801280128012801280128012801280128012801280128012801280128012801280128012801280128012801280128012801280128012801280128012801280128012801280
6425625625625625625625625625625625625625625625625625625625625625625602560256025602560256025602560256025602560256025602560256025602560256025602560256025602560256025602560256025602560256025602560256025602560256025602560256025602560256025602560
6451251251251251251251251251251251251251251251251251251251251251251205120512051205120512051205120512051205120512051205120512051205120512051205120512051205120512051205120512051205120512051205120512051205120512051205120512051205120512051205120
641024102410241024102410241024102410241024102410241024102410241024102410241024102410241024010240102401024010240102401024010240102401024010240102401024010240102401024010240102401024010240102401024010240102401024010240102401024010240102401024010240102401024010240102401024010240102401024010240102401024010240
1281212121212121212121212121212121212121212111111111111111111111111111111111111111111111111111111111111111111111
1286412864128641286412864128641286412864128641286412864128641286412864128641286412864128641286412864128646464646464646464646464646464646464646464646464646464646464646464646464646464646464646464646464646464646464646464646464646464646464646464
128128256128256128256128256128256128256128256128256128256128256128256128256128256128256128256128256128256128256128256128256128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128128
128256512256512256512256512256512256512256512256512256512256512256512256512256512256512256512256512256512256512256512256512256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256256
12851210245121024512102451210245121024512102451210245121024512102451210245121024512102451210245121024512102451210245121024512102451210245121024512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512512
1281024204810242048102420481024204810242048102420481024204810242048102420481024204810242048102420481024204810242048102420481024204810242048102420481024204810242048102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024
2561323232323232323232323232323232323232323232323232323232323232323232323232323232322222222222222222222222222222
25664192128192128192128192128192128192128192128192128192128192128192128192128192128192128192128192128192128192128192128192128192128192128192128192128192128192128192128192128192128192128192128192128192128192128192128192128192128192128192128192128128128128128128128128128128128128128128128128128128128128128128128128128128128128128
256128384256384256384256384256384256384256384256384256384256384256384256384256384256384256384256384256384256384256384256384256384256384256384256384256384256384256384256384256384256384256384256384256384256384256384256384256384256384256384256384256256256256256256256256256256256256256256256256256256256256256256256256256256256256256
256256768512768512768512768512768512768512768512768512768512768512768512768512768512768512768512768512768512768512768512768512768512768512768512768512768512768512768512768512768512768512768512768512768512768512768512768512768512768512768512768512512512512512512512512512512512512512512512512512512512512512512512512512512512512512
256512153610241536102415361024153610241536102415361024153610241536102415361024153610241536102415361024153610241536102415361024153610241536102415361024153610241536102415361024153610241536102415361024153610241536102415361024153610241536102415361024153610241536102415361024153610241536102415361024153610241536102415361024153610241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024102410241024
256672201613442016134420161344201613442016134420161344201613442016134420161344201613442016134420161344201613442016134420161344201613442016134420161344201613442016134420161344201613442016134420161344201613442016134420161344201613442016134420161344201613442016134420161344201613442016134420161344201613442016134420161344201613441344134413441344134413441344134413441344134413441344134413441344134413441344134413441344134413441344134413441344
5121555555555555555555555555555555555555555555555555555554545454545454545454545454545454545454545454545454545454
51264320320320320320320320320320320320320320320320320320320320320320320320320320320320320320320320320320320320320320320320320320320320320320320320320320320320320320256320256320256320256320256320256320256320256320256320256320256320256320256320256320256320256320256320256320256320256320256320256320256320256320256320256320256320256
512128640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640640512640512640512640512640512640512640512640512640512640512640512640512640512640512640512640512640512640512640512640512640512640512640512640512640512640512640512640512
512256128012801280128012801280128012801280128012801280128012801280128012801280128012801280128012801280128012801280128012801280128012801280128012801280128012801280128012801280128012801280128012801280128012801280128012801024128010241280102412801024128010241280102412801024128010241280102412801024128010241280102412801024128010241280102412801024128010241280102412801024128010241280102412801024128010241280102412801024128010241280102412801024
512384192019201920192019201920192019201920192019201920192019201920192019201920192019201920192019201920192019201920192019201920192019201920192019201920192019201920192019201920192019201920192019201920192019201920192019201536192015361920153619201536192015361920153619201536192015361920153619201536192015361920153619201536192015361920153619201536192015361920153619201536192015361920153619201536192015361920153619201536192015361920153619201536
102411091091091091091091091091091091091091091091091091091091091091091091091091091091091091091091091091091091091091091091091091091091091091091091091091091091091099999
102464640576640576640576640576640576640576640576640576640576640576640576640576640576640576640576640576640576640576640576640576640576640576640576640576640576640576640576640576640576640576640576640576640576640576640576640576640576640576640576640576640576640576640576640576640576640576640576640576640576640576640576640576576576576576
1024128128011521280115212801152128011521280115212801152128011521280115212801152128011521280115212801152128011521280115212801152128011521280115212801152128011521280115212801152128011521280115212801152128011521280115212801152128011521280115212801152128011521280115212801152128011521280115212801152128011521280115212801152128011521280115212801152128011521280115212801152128011521280115212801152128011521280115212801152128011521152115211521152
1024192192017281920172819201728192017281920172819201728192017281920172819201728192017281920172819201728192017281920172819201728192017281920172819201728192017281920172819201728192017281920172819201728192017281920172819201728192017281920172819201728192017281920172819201728192017281920172819201728192017281920172819201728192017281920172819201728192017281920172819201728192017281920172819201728192017281920172819201728192017281728172817281728
20481191919191919191919191919191919191919191919191919191919191919191919191919191919191919191919191919191919191919191919191919191919191919191919191919191919191919191919191919191919191919191919191919191919191918191819181918
204864121612161216121612161216121612161216121612161216121612161216121612161216121612161216121612161216121612161216121612161216121612161216121612161216121612161216121612161216121612161216121612161216121612161216121612161216121612161216121612161216121612161216121612161216121612161216121612161216121612161216121612161216121612161216121612161216121612161216121612161216121612161216121612161216121612161216121612161152121611521216115212161152
204896182418241824182418241824182418241824182418241824182418241824182418241824182418241824182418241824182418241824182418241824182418241824182418241824182418241824182418241824182418241824182418241824182418241824182418241824182418241824182418241824182418241824182418241824182418241824182418241824182418241824182418241824182418241824182418241824182418241824182418241824182418241824182418241824182418241824182418241728182417281824172818241728
34561323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232
345664204820482048204820482048204820482048204820482048204820482048204820482048204820482048204820482048204820482048204820482048204820482048204820482048204820482048204820482048204820482048204820482048204820482048204820482048204820482048204820482048204820482048204820482048204820482048204820482048204820482048204820482048204820482048204820482048204820482048204820482048204820482048204820482048204820482048204820482048204820482048204820482048

另外上面这个表格要注意A100是怎么调度blocks的:
看起来首先是把blocks放置在偶数sm id的SM上,接着奇数 sm id, 依次轮询layout。

回到问题,实测值和我们计算值不一样?

比如,
当grid dim为256的时候,按照传统计算:
block_dim=(108 * 32 )/ 256 = 864(108个SM, 每个SM最多32个blocks )
那么grid_dim , block_dim 为<<<256, 864>>>的时候能把所有硬件资源利用完,但是当你采用cooperative launch的时候,发现错误?why?

grid dim是256,所以我们有256个blocks,那么256/108 = 2.37,说明至少一部分的SM上面要放3个blocks, 其中一个SM上就要放置:
3*864 = 2596个threads
明显可以看到2596 > 2048 所以 CUDA报错了。
因为blocks内部的threads不能跨SM,所以这个时候即使有空闲的资源,也不能完全利用,所以合理选用grid dim和block dim非常重要。

查看cuda文档,看到下面介绍,可以通过函数

cudaOccupancyMaxActiveBlocksPerMultiprocessor()来查询在给定threads per block 的时候,每个SM上能驻留的最大blocks数量:
比如,当numThreads=864的时候,每个SM上能驻留的最大blocks是2.
在这里插入图片描述

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值