混合精度训练支持什么显卡_混合精度训练

最新推荐文章于 2024-07-27 21:36:32 发布

杨_明

最新推荐文章于 2024-07-27 21:36:32 发布

阅读量2.1k

点赞数 1

文章标签：深度学习机器学习 tensorflow python 神经网络

原文链接：https://medium.com/analytics-vidhya/mixed-precision-training-fd08f4c8e72d

版权

本文探讨了混合精度训练所支持的显卡类型，来源于https://medium.com/analytics-vidhya/mixed-precision-training-fd08f4c8e72d，主要关注在深度学习中使用混合精度训练的硬件需求。

摘要由CSDN通过智能技术生成

混合精度训练支持什么显卡

Discover a way to efficiently utilize your GPU

探索有效利用GPU的方法

涵盖的清单(List of things covered)

What is Mixed Precision Training
什么是混合精密训练
Why MPT is Important
为什么MPT很重要
How MPT reduces memory
MPT如何减少内存
Frameworks with AMP (Automatic Mixed Precision)
带有AMP(自动混合精度)的框架

什么是混合精密训练(What is Mixed Precision Training)

Mixed precision training is a technique used in training a large neural network where the model’s parameter are stored in different datatype precision (FP16 vs FP32 vs FP64). It offers significant performance and computational boost by training large neural networks in lower precision formats. With release of 30X series of GPUs it becomes even more important to utilize these features.

混合精度训练是一种用于训练大型神经网络的技术，其中模型的参数以不同的数据类型精度( FP16与FP32与FP64 )存储。 通过以较低的精度格式训练大型神经网络，它提供了显着的性能和计算能力。 随着30X系列GPU的发布，利用这些功能变得更加重要。

For instance, In Pytorch, the single precision float mean float32 and by default the parameters takes float32 datatype. Now if we have a parameter (W) which could be stored in FP16 while ensuring that no task specific accuracy is affected by this movement between precision, then why should we use FP32 or FP64?

例如，在Pytorch中，单精度float均值float32 ，默认情况下参数采用float32数据类型。 现在，如果我们有一个参数(W)可以存储在FP16中，同时确保精度之间的这种移动不会影响特定于任务的精度，那么为什么要使用FP32或FP64？

Notations

记号

FP16 — Half-Precision, 16bit Floating Point-occupies 2 bytes of memory
FP16 —半精度，16位浮点占用2个字节的内存
FP32 — Single-Precision, 32bit Floating Point-occupies 4 bytes of memory
FP32 —单精度32位浮点占用4个字节的内存
FP64— Double-Precision, 64bit Floating Point-occupies 8 bytes of memory
FP64 —双精度64位浮点占用8个字节的内存

Since the introduction of Tensor Cores in the Volta and Turing architectures (NVIDIA), significant training speedups are experienced by switching to mixed precision — up to 3x overall speedup on the most arithmetically intense model architectures. The ability to train deep learning networks with lower precision was introduced in the Pascal architecture and first supported in CUDA® 8 in the NVIDIA Deep Learning SDK.

自从在Volta和Turing架构(NVIDIA)中引入Tensor Core以来，通过切换到混合精度，可显着提高培训速度-在算术强度最高的模型架构上，总体速度可提高3倍。 Pascal架构引入了以较