从torch1.7起开始有torch.backends.cudnn.allow_tf32
在torch1.7-1.11默认是True, 在1.12及以后默认是False
作用是是否允许PyTorch在内部使用TensorFloat32(TF32)的 tensor core (在NVIDIA GPU的新的Ampere架构开始使用) 来计算matmul(矩阵乘法和分批矩阵乘法)和卷积。
TF32 tensor core的设计是为了在torch.float32张量上实现更好的matmul和卷积性能(它将输入数据四舍五入到有10比特的尾数,并以FP32精度累积结果,保持FP32动态范围)
在MGCA中就显式设为True
torch.backends.cuda.matmul.allow_tf32 = True # The flag below controls whether to allow TF32 on cuDNN. This flag defaults to True. torch.backends.cudnn.allow_tf32 = True
例子
a_full = torch.randn(10240, 10240, dtype=torch.double, device='cuda') b_full = torch.randn(10240, 10240, dtype=torch.double, device='cuda') ab_full = a_full @ b_full mean = ab_full.abs().mean() # 80.7277 a = a_full.float() b = b_full.float() # Do matmul at TF32 mode. torch.backends.cuda.matmul.allow_tf32 = True ab_tf32 = a @ b # takes 0.016s on GA100 error = (ab_tf32 - ab_full).abs().max() # 0.1747 relative_error = error / mean # 0.0022 # Do matmul with TF32 disabled. torch.backends.cuda.matmul.allow_tf32 = False ab_fp32 = a @ b # takes 0.11s on GA100 error = (ab_fp32 - ab_full).abs().max() # 0.0031 relative_error = error / mean # 0.000039
torch.backends(torch.backends.cudnn.allow_tf32)(torch.backends.cuda.matmul.allow_tf32)
最新推荐文章于 2023-11-05 13:22:29 发布