pytorch分布式训练理解

最新推荐文章于 2024-04-29 23:47:16 发布

真炎破天

最新推荐文章于 2024-04-29 23:47:16 发布

阅读量273

点赞数

分类专栏：深度学习基础知识文章标签： pytorch python 深度学习

本文链接：https://blog.csdn.net/u012409283/article/details/119926029

版权

深度学习基础知识专栏收录该内容

6 篇文章 0 订阅

订阅专栏

单机多卡训练

配置每个进程的gpu

ddp模式

local_rank = torch.distributed.get_rank()	# 获取当前进程在所有进程中的编号
torch.cuda.set_device(local_rank)	# 与下面的等效使用CUDA_VISIBLE_DEVICES=local_rank，即指定当前进程使用的gpu编号
device = torch.device("cuda", local_rank) ## 表示 torch.Tensor 分配到指定编号的设备
device = torch.device("cuda:1") #如果local_rank=1,与上述等效
device = torch.device("cuda")  # 默认为当前设备

horovod模式

hvd.init()
torch.cuda.set_device(hvd.local_rank())
#此处没有申明device，这样，在将torch.Tensor分配到gpu时，可以采用torch.cuda()

cuda其他用法

#cuda是否可用
print(torch.cuda.is_available())
##返回gpu数量
print(torch.cuda.device_count())
#返回当前设备索引；
print(torch.cuda.current_device())

优惠劵

真炎破天

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
pytorch分布式训练理解

单机多卡训练配置每个进程的gpuddp模式local_rank = torch.distributed.get_rank() # 获取当前进程在所有进程中的编号torch.cuda.set_device(local_rank) # 等效使用CUDA_VISIBLE_DEVICES=local_rank，即指定当前进程使用的gpu编号device = torch.device("cuda", local_rank) ## 表示 torch.Tensor 分配到指定编号的设备device = t
复制链接

扫一扫