混合精度训练支持什么显卡_混合精度训练

本文探讨了混合精度训练所支持的显卡类型,来源于https://medium.com/analytics-vidhya/mixed-precision-training-fd08f4c8e72d,主要关注在深度学习中使用混合精度训练的硬件需求。
摘要由CSDN通过智能技术生成

混合精度训练支持什么显卡

Discover a way to efficiently utilize your GPU

探索有效利用GPU的方法

涵盖的清单(List of things covered)

  • What is Mixed Precision Training

    什么是混合精密训练

  • Why MPT is Important

    为什么MPT很重要

  • How MPT reduces memory

    MPT如何减少内存

  • Frameworks with AMP (Automatic Mixed Precision)

    带有AMP(自动混合精度)的框架

什么是混合精密训练(What is Mixed Precision Training)

Mixed precision training is a technique used in training a large neural network where the model’s parameter are stored in different datatype precision (FP16 vs FP32 vs FP64). It offers significant performance and computational boost by training large neural networks in lower precision formats. With release of 30X series of GPUs it becomes even more important to utilize these features.

混合精度训练是一种用于训练大型神经网络的技术,其中模型的参数以不同的数据类型精度( FP16与FP32与FP64 )存储。 通过以较低的精度格式训练大型神经网络,它提供了显着的性能和计算能力。 随着30X系列GPU的发布,利用这些功能变得更加重要。

For instance, In Pytorch, the single precision float mean float32 and by default the parameters takes float32 datatype. Now if we have a parameter (W) which could be stored in FP16 while ensuring that no task specific accuracy is affected by this movement between precision, then why should we use FP32 or FP64?

例如,在Pytorch中单精度float均值float32 ,默认情况下参数采用float32数据类型。 现在,如果我们有一个参数(W)可以存储在FP16中,同时确保精度之间的这种移动不会影响特定任务的精度,那么为什么要使用FP32或FP64?

Notations

记号

  • FP16 — Half-Precision, 16bit Floating Point-occupies 2 bytes of memory

    FP16 —半精度,16位浮点占用2个字节的内存

  • FP32 — Single-Precision, 32bit Floating Point-occupies 4 bytes of memory

    FP32 —单精度32位浮点占用4个字节的内存

  • FP64— Double-Precision, 64bit Floating Point-occupies 8 bytes of memory

    FP64 —双精度64位浮点占用8个字节的内存

Since the introduction of Tensor Cores in the Volta and Turing architectures (NVIDIA), significant training speedups are experienced by switching to mixed precision — up to 3x overall speedup on the most arithmetically intense model architectures. The ability to train deep learning networks with lower precision was introduced in the Pascal architecture and first supported in CUDA® 8 in the NVIDIA Deep Learning SDK.

自从在Volta和Turing架构(NVIDIA)中引入Tensor Core以来,通过切换到混合精度,可显着提高培训速度-在算术强度最高的模型架构上,总体速度可提高3倍。 Pascal架构引入了以较

  • 1
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值