单机多卡训练神经网络

最新推荐文章于 2024-07-15 22:00:49 发布

超级无敌大鹏哥

最新推荐文章于 2024-07-15 22:00:49 发布

阅读量974

点赞数

文章标签：深度学习神经网络 pytorch

本文链接：https://blog.csdn.net/zzphahahaha/article/details/122246109

版权

单机多卡训练神经网络

DDP（DistributedDataParallel）

DDP（DistributedDataParallel）

DDP优于DP，这里不做赘述，请读者自己搜索吧。

如何迅速集成？

直接上代码：
第一步：初始化

dist.init_process_group(backend='nccl')
local_rank = torch.distributed.get_rank()
torch.cuda.set_device(local_rank)
device = torch.device("cuda", local_rank)

第二步：将模型并行化

if torch.cuda.device_count() > 1:
	model = torch.nn.parallel.DistributedDataParallel(model .to(device),find_unused_parameters=True,broadcast_buffers=False,device_ids=[local_rank],output_device=local_rank)

第三步：将数据放在GPU上：

inputs = inputs.to(device)
gt= gt.to(device)

笔者遇到的问题

RuntimeError: CUDA out of memory

解决方案

原来的代码

torch.nn.parallel.DistributedDataParallel(model .to(device),find_unused_parameters=True,broadcast_buffers=False,device_ids=[0，1，2，3])

修改后的代码

model = torch.nn.parallel.DistributedDataParallel(model .to(device),find_unused_parameters=True,broadcast_buffers=False,device_ids=[local_rank],output_device=local_rank)

貌似是放不下导致的问题，但是单卡训练是可以的，最后这样解决了问题。

超级无敌大鹏哥

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
单机多卡训练神经网络

单机多卡训练神经网络DDP（DistributedDataParallel）如何迅速集成？笔者遇到的问题解决方案DDP（DistributedDataParallel）DDP优于DP，这里不做赘述，请读者自己搜索吧。如何迅速集成？直接上代码：第一步：初始化dist.init_process_group(backend='nccl')local_rank = torch.distributed.get_rank()torch.cuda.set_device(local_rank)device
复制链接

扫一扫