torch.backends(torch.backends.cudnn.allow_tf32)(torch.backends.cuda.matmul.allow_tf32)

最新推荐文章于 2023-11-05 13:22:29 发布

hxxjxw

最新推荐文章于 2023-11-05 13:22:29 发布

阅读量2k

点赞数

文章标签：深度学习 python 人工智能

本文链接：https://blog.csdn.net/hxxjxw/article/details/129511236

版权

从torch1.7起开始有torch.backends.cudnn.allow_tf32

在torch1.7-1.11默认是True, 在1.12及以后默认是False

作用是是否允许PyTorch在内部使用TensorFloat32（TF32）的 tensor core (在NVIDIA GPU的新的Ampere架构开始使用) 来计算matmul（矩阵乘法和分批矩阵乘法）和卷积。

TF32 tensor core的设计是为了在torch.float32张量上实现更好的matmul和卷积性能（它将输入数据四舍五入到有10比特的尾数，并以FP32精度累积结果，保持FP32动态范围）

在MGCA中就显式设为True
torch.backends.cuda.matmul.allow_tf32 = True
# The flag below controls whether to allow TF32 on cuDNN. This flag defaults to True.
torch.backends.cudnn.allow_tf32 = True
例子
a_full = torch.randn(10240, 10240, dtype=torch.double, device='cuda')
b_full = torch.randn(10240, 10240, dtype=torch.double, device='cuda')
ab_full = a_full @ b_full
mean = ab_full.abs().mean()  # 80.7277

a = a_full.float()
b = b_full.float()

# Do matmul at TF32 mode.
torch.backends.cuda.matmul.allow_tf32 = True
ab_tf32 = a @ b  # takes 0.016s on GA100
error = (ab_tf32 - ab_full).abs().max()  # 0.1747
relative_error = error / mean  # 0.0022

# Do matmul with TF32 disabled.
torch.backends.cuda.matmul.allow_tf32 = False
ab_fp32 = a @ b  # takes 0.11s on GA100
error = (ab_fp32 - ab_full).abs().max()  # 0.0031
relative_error = error / mean  # 0.000039
CUDA semantics — PyTorch 1.13 documentation

hxxjxw

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
torch.backends(torch.backends.cudnn.allow_tf32)(torch.backends.cuda.matmul.allow_tf32)

作用是是否允许PyTorch在内部使用TensorFloat32（TF32）的 tensor core (在NVIDIA GPU的新的Ampere架构开始使用) 来计算matmul（矩阵乘法和分批矩阵乘法）和卷积。TF32 tensor core的设计是为了在torch.float32张量上实现更好的matmul和卷积性能（它将输入数据四舍五入到有10比特的尾数，并以FP32精度累积结果，保持FP32动态范围）在torch1.7-1.11默认是True, 在1.12及以后默认是False。
复制链接

扫一扫