慎用 maxrregcount

最新推荐文章于 2022-04-25 14:19:53 发布

weixin_30750335

最新推荐文章于 2022-04-25 14:19:53 发布

阅读量378

点赞数

文章标签： python

原文链接：http://www.cnblogs.com/xingzifei/p/7482454.html

版权

需要编译一个 *.cubin 文件。

在编译时使用--ptxas-option=v参数，显示register使用的个数是36。于是，在编译时使用maxrregcount=32。从而，register的使用个数是32，使用了“8 bytes stack frame, 12 bytes spill stores, 28 bytes spill loads”

nvcc -cubin -m64 -arch sm_35 *.cu --use_fast_math --maxrregcount=32 --ptxas-options=-v -O3 -o *.cubin

但是，经过多次测试，发现浮点计算结果不一样（int计算结果没有测试）。

因此，发现了这个bug：maxrregcount可能导致最终结果不同。

搜了一下，其他人也遇到了这个问题。有一个解释如下：

“Operation order may change with register optimization. Since fp arithmetic is not associative due to finite precision, this may affect the result.”

转载于:https://www.cnblogs.com/xingzifei/p/7482454.html

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

weixin_30750335

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
慎用 maxrregcount

需要编译一个 *.cubin 文件。在编译时使用--ptxas-option=v参数，显示register使用的个数是36。于是，在编译时使用maxrregcount=32。从而，register的使用个数是32，使用了“8 bytes stack frame, 12 bytes spill stores, 28 bytes spill loads”nvcc -cubin -m64 -...
复制链接

扫一扫